prepbook
  • Introduction
  • Some common stuff
    • python __repr__
    • HackerRank input tips
  • Data Structures and Algorithms
    • Breadth first search
    • Depth First Search
    • Dijkstra
    • A* Search Algorithm
    • Binary Search
    • python counter
    • Sorting
      • Merge Sort
      • Quick Sort
    • Priority Queue
  • Multiprocessing vs Threading
  • Common Coding
    • Find loop in lin list
    • Maximum sum subarray
  • Coding
    • Valid palindrome
    • Palindrome number
    • Remove duplicates from sorted array
    • Island perimeter
    • Serialize and Deserialize Binary Tree
    • Valid Soduku
    • Word Pattern
    • Word Pattern II
    • Group Anagrams
    • Implement Trie
    • Deep copy list with random node
    • Palindrome Permutation
    • Combination Sum
    • Clone Graph
    • Generate parenthesis
    • Fibonacci Number
    • LRU Cache
    • Merge two sorted arrays in place
    • Hamming Distance
    • Merge K sorted arrays
    • Kth smalles element in BST
    • Kth largest element in an array
    • Remove duplicates from sorted list
    • Power of 2
    • Nested list weight sum
    • SIngle number in a list
    • Factor combinations
    • Delete node from BST
  • hacker Rank
    • Coding
      • print staircase
      • Drawing book
      • Challenge 0
      • Min-Max sum
  • WorkRelatedCoding
    • Rectangle Overlap
  • Python tips
Powered by GitBook
On this page

Was this helpful?

Multiprocessing vs Threading

PreviousPriority QueueNextCommon Coding

Last updated 5 years ago

Was this helpful?

Ref: Ref: REf:

The threading module uses threads, the multiprocessing uses processes. The difference is that threads run in the same memory space, while processes have separate memory. This makes it a bit harder to share objects between processes with multiprocessing. Since threads use the same memory, precautions have to be taken or two threads will write to the same memory at the same time. This is what the global interpreter lock (GIL) is for.

Spawning processes is a bit slower than spawning threads. Once they are running, there is not much difference.

Multiple threads can exist in a single process. The threads that belong to the same process share the same memory area (can read from and write to the very same variables, and can interfere with one another). On the contrary, different processes live in different memory areas, and each of them has its own variables. In order to communicate, processes have to use other channels (files, pipes or sockets).

If you want to parallelize a computation, you're probably going to need multithreading, because you probably want the threads to cooperate on the same memory.

Speaking about performance, threads are faster to create and manage than processes (because the OS doesn't need to allocate a whole new virtual memory area), and inter-thread communication is usually faster than inter-process communication. But threads are harder to program. Threads can interfere with one another, and can write to each other's memory, but the way this happens is not always obvious (due to several factors, mainly instruction reordering and memory caching), and so you are going to need synchronization primitives to control access to your variables.

Unfortunately the internals of the main Python interpreter, CPython, negate the possibility of true multi-threading due to a process known as the Global Interpreter Lock (GIL). The GIL is necessary because the Python interpreter is not thread safe.

# thread_test.py

import random
import threading


def list_append(count, id, out_list):
    """
    Creates an empty list and then appends a 
    random number to the list 'count' number
    of times. A CPU-heavy operation!
    """
    for i in range(count):
        out_list.append(random.random())

if __name__ == "__main__":
    size = 10000000   # Number of random numbers to add
    threads = 2   # Number of threads to create

    # Create a list of jobs and then iterate through
    # the number of threads appending each thread to
    # the job list 
    jobs = []
    for i in range(0, threads):
        out_list = list()
        thread = threading.Thread(target=list_append(size, i, out_list))
        jobs.append(thread)

    # Start the threads (i.e. calculate the random number lists)
    for j in jobs:
        j.start()

    # Ensure all of the threads have finished
    for j in jobs:
        j.join()

    print "List processing complete."
http://stackoverflow.com/questions/3044580/multiprocessing-vs-threading-python
http://ls.pwd.io/2014/08/python-threads-vs-processes/
https://www.quantstart.com/articles/parallelising-python-with-threading-and-multiprocessing