Table of contents
No headings in the article.
Everything has its own pros and cons! Even your favourite programming language like Python. One of the most known features of Python is GIL which stands for "Global Interpreter Lock". It is a mechanism that is used in the CPython interpreter to synchronize the execution of threads so that only one thread can execute Python bytecode at a time. This means that even on a multi-core processor, only one thread can execute Python code at a time.
Back then GIL was made in order to simplify the implementation of the CPython interpreter. The GIL make it easier to write thread-safe CPython extensions, but it also limits the parallelism of Python programs and became a performance bottleneck for CPU-bound programs.
Then how is this related to memory leaks while multithreading? Why am I talking about GIL? Nice question!
Before getting into it we need to understand when we should use multithreading or multiprocessing or asyncio in Python. When a task is CPU-bound, multiprocessing should be used, when it is fast and I/O-bound with a small number of connections, multithreading should be used, and when it is slow and I/O-bound with a large number of connections, asyncio should be used.
Here is a program to demonstrate multithreading in Python:
# Multithreading.py
from threading import Thread
import time
def func(item):
time.sleep(item)
start = time.perf_counter()
batch_processes = []
for i in list(range(1,13)):
p = Thread(target=func, args=(i,))
batch_processes.append(p)
p.start()
for p in batch_processes:
p.join()
finish = time.perf_counter()
print (f'Finished in {round(finish-start, 2)} second(s)')
# OUTPUT
Finished in 12.01 second(s)
Now, how can GIL affect the memory leak in Python? To be concise it is not properly memory leak. When running any I/O-bound Python programmes, such as any API calls, writing to and reading from disc, and creating connections to web servers or database clients, the GIL will always be released, allowing other threads to run Python code during this time. This is because these programmes spend the majority of their time waiting for I/O operations to complete.
But suppose you are using multithreading and loading a tone of heavy libraries while creating multiple connections with different APIs or databases as well as storing them can lead to memory leak because at that time there will be a situation when the thread will be sleeping for 1 or something seconds and the GIL will get released and it will start the next thread in the meantime which will repeat the whole process of loading heavy libraries and creating connections with APIs and databases and it will take up the memory ones again and the memory will not release until the process inside the thread is completed.
Same if we run any highly CPU bounded task which also includes some I/O bound task in multithreading it will also take up extra memory due to incomplete threads when GIL get released at any point in time which will lead to a high memory leak.
This is what I faced and while researching more about this topic of GIL and multithreading I concluded the above information. Please be sure to share any difficulties you have encountered while working with multiple threads in Python.