Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I am trying to learn how to incorporate multiprocessing capabilities into my Python code. After reading through the documentation and several tutorials, I think I have an understanding of what my code should look like. I just want to be sure that I am not leaving out something important:

from multiprocessing import Pool

def job(args):
    """Your job function"""

if __name__ == '__main__':
    # Sets up a process pool. Defaults to number of cores.
    # Each input gets passed to job and processed in a separate process.

    inputs = [
        'file_1.txt',
        'file_2.txt',
        'file_3.txt'
    ]

    Pool().map(job, inputs)
share|improve this question

closed as off-topic by 200_success Jul 16 '14 at 19:41

This question appears to be off-topic. The users who voted to close gave this specific reason:

  • "Questions must involve real code that you own or maintain. Questions seeking an explanation of someone else's code are off-topic. Pseudocode, hypothetical code, or stub code should be replaced by a concrete example." – 200_success
If this question can be reworded to fit the rules in the help center, please edit the question.

    
Is this not the complete code? –  Jamal Jul 16 '14 at 17:12
    
@Jamal: thank you. Yes, this is my complete multiprocessing boiler-plate code. Does it look correct? –  fire_water Jul 16 '14 at 17:59
    
I'm not familiar with Python, so I'm not certain. I'm just making sure that it's ample code for others to review. –  Jamal Jul 16 '14 at 17:59
    
You can see a example code (complete) in my GitHub github.com/SalvaJ/Learning/blob/master/hilos2.py –  Trimax Jul 16 '14 at 19:19

1 Answer 1

up vote 2 down vote accepted

You should store your pool object so you can close and join it correctly:

from multiprocessing import Pool

def job(args):
    """Your job function"""

if __name__ == '__main__':
    # Sets up a process pool. Defaults to number of cores.
   # Each input gets passed to job and processed in a separate process.

    inputs = [
        'file_1.txt',
        'file_2.txt',
        'file_3.txt'
    ]

    p = Pool()
    p.map(job, inputs)

    # Closing and joining a pool is important to ensure all resources are freed properly
    p.close()
    p.join()

Sidenote: When you post links to expanded code, make sure the rest of the code matches what you're posting here. Your link leads to a thread-based implementation of your algorithm, and handling threads is different from handling a pool of workers from the multiprocessing library.

share|improve this answer
    
Thank you! A script that once took 3 hours to run now takes 20 minutes thanks to your help & Python's multiprocessing library! Question: does each process contain its own copy of the script? For example, if my script contains a global variable that is updated while each file is being processed, does each process contain its own version of that global variable? –  fire_water Jul 17 '14 at 18:46
    
Instead of posting your question in the comments, you should post it as a new question (with your code) over at stackoverflow. Make sure you check for duplicates before posting, because there are a lot of existing questions about multiprocessing. –  skrrgwasme Jul 17 '14 at 18:50

Not the answer you're looking for? Browse other questions tagged or ask your own question.