Sign up ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free.

I am working with Python (IPython & Canopy) and a RESTful content API, on my local machine (Mac).

I have an array of 3000 unique IDs to pull data for from the API and can only call the API with one ID at a time.

I was hoping somehow to make 3 sets of 1000 calls in parallel to speed things up.

What is the best way of doing this?

Thanks in advance for any help!

share|improve this question
1  
do You consider to use threads ( separate thread for each request)? –  oleg Jun 7 '13 at 11:28
    
I am okay with that as long as its the right option - I imagine the whole affair is 'embarrassingly parallelisable'... –  user7289 Jun 7 '13 at 19:30

1 Answer 1

up vote 4 down vote accepted

Without more information about what you are doing in particular, it is hard to say for sure, but a simple threaded approach may make sense.

Assuming you have a simple function that processes a single ID:

import requests

url_t = "http://localhost:8000/records/%i"

def process_id(id):
    """process a single ID"""
    # fetch the data
    r = requests.get(url_t % id)
    # parse the JSON reply
    data = r.json()
    # and update some data with PUT
    requests.put(url_t % id, data=data)
    return data

You can expand that into a simple function that processes a range of IDs:

def process_range(id_range, store=None):
    """process a number of ids, storing the results in a dict"""
    if store is None:
        store = {}
    for id in id_range:
        store[id] = process_id(id)
    return store

and finally, you can fairly easily map sub-ranges onto threads to allow some number of requests to be concurrent:

from threading import Thread

def threaded_process_range(nthreads, id_range):
    """process the id range in a specified number of threads"""
    store = {}
    threads = []
    # create the threads
    for i in range(nthreads):
        ids = id_range[i::nthreads]
        t = Thread(target=process_range, args=(ids,store))
        threads.append(t)

    # start the threads
    [ t.start() for t in threads ]
    # wait for the threads to finish
    [ t.join() for t in threads ]
    return store

A full example in an IPython Notebook: http://nbviewer.ipython.org/5732094

If your individual tasks take a more widely varied amount of time, you may want to use a ThreadPool, which will assign jobs one at a time (often slower if individual tasks are very small, but guarantees better balance in heterogenous cases).

share|improve this answer
    
Quick one what does the :: do above? Why not just a single :? –  user7289 Jun 8 '13 at 8:05
1  
It means stride. when you specify a slice, there are three numbers: start:stop:stride. So 1::3 means every third element, starting with 1, i.e. [1,4,7,...]. This is just a simple way to equally partition a list. –  minrk Jun 9 '13 at 9:35
    
So the double colon just means the stop is unspecified, and defaults to "the end". –  minrk Jun 9 '13 at 9:36

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.