Multiprocessing vs Threading

Ref: http://stackoverflow.com/questions/3044580/multiprocessing-vs-threading-python Ref: http://ls.pwd.io/2014/08/python-threads-vs-processes/ REf: https://www.quantstart.com/articles/parallelising-python-with-threading-and-multiprocessing

The threading module uses threads, the multiprocessing uses processes. The difference is that threads run in the same memory space, while processes have separate memory. This makes it a bit harder to share objects between processes with multiprocessing. Since threads use the same memory, precautions have to be taken or two threads will write to the same memory at the same time. This is what the global interpreter lock (GIL) is for.

Spawning processes is a bit slower than spawning threads. Once they are running, there is not much difference.

Multiple threads can exist in a single process. The threads that belong to the same process share the same memory area (can read from and write to the very same variables, and can interfere with one another). On the contrary, different processes live in different memory areas, and each of them has its own variables. In order to communicate, processes have to use other channels (files, pipes or sockets).

If you want to parallelize a computation, you're probably going to need multithreading, because you probably want the threads to cooperate on the same memory.

Speaking about performance, threads are faster to create and manage than processes (because the OS doesn't need to allocate a whole new virtual memory area), and inter-thread communication is usually faster than inter-process communication. But threads are harder to program. Threads can interfere with one another, and can write to each other's memory, but the way this happens is not always obvious (due to several factors, mainly instruction reordering and memory caching), and so you are going to need synchronization primitives to control access to your variables.

Unfortunately the internals of the main Python interpreter, CPython, negate the possibility of true multi-threading due to a process known as the Global Interpreter Lock (GIL). The GIL is necessary because the Python interpreter is not thread safe.

# thread_test.py

import random
import threading


def list_append(count, id, out_list):
    """
    Creates an empty list and then appends a 
    random number to the list 'count' number
    of times. A CPU-heavy operation!
    """
    for i in range(count):
        out_list.append(random.random())

if __name__ == "__main__":
    size = 10000000   # Number of random numbers to add
    threads = 2   # Number of threads to create

    # Create a list of jobs and then iterate through
    # the number of threads appending each thread to
    # the job list 
    jobs = []
    for i in range(0, threads):
        out_list = list()
        thread = threading.Thread(target=list_append(size, i, out_list))
        jobs.append(thread)

    # Start the threads (i.e. calculate the random number lists)
    for j in jobs:
        j.start()

    # Ensure all of the threads have finished
    for j in jobs:
        j.join()

    print "List processing complete."

PreviousPriority Queue NextCommon Coding

Last updated 5 years ago

Was this helpful?