Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I have a DB with a queue table, new entries are inserted continuously in the queue.

I want a Python script to execute the queue as fast as possible, and I think I need some threaded code to do so, running like a daemon.

But I can't figure out how to use the DB as the queue.

I am looking at this example:

import MySQLdb
from Queue import Queue
from threading import Thread

def do_stuff(q):
    while True:
        print q.get()
        q.task_done()

q = Queue(maxsize=0)
num_threads = 10

for i in range(num_threads):
    worker = Thread(target=do_stuff, args=(q,))
    worker.setDaemon(True)
    worker.start()

// TODO:  Use the DB
db = MySQLdb.connect(...)
cursor = db.cursor()
q = cursor.execute("SELECT * FROM queue")

for x in range(100):
    q.put(x)
q.join()
share|improve this question

This question has an open bounty worth +50 reputation from pkdkk ending in 14 hours.

The question is widely applicable to a large audience. A detailed canonical answer is required to address all the concerns.

    
I can't understand what you are trying to do with the db. Anyway Python has GIL, that means that most likely you will get zero performance boost for parallelising this operation. –  mkorpela Sep 26 at 11:25

2 Answers 2

up vote 2 down vote accepted

2 quick points :

  1. Assuming you are using cPython, The GIL will effectively render threading useless, allowing only 1 thread through the interpreter at one time. Couple of workarounds are :

    • The Gevent library [source]

      gevent is a coroutine-based Python networking library that uses greenlet to provide a high-level synchronous API on top of the libev event loop.

    • The multiprocessing module, you can spawn multiple processes - this is true concurrency in python.

    • The concurrent.futures module - new in python 3, port available for python 2. [source]

      This is a new high-level library that operates only at a “job” level, which means that you no longer have to fuss with
      synchronization, or managing threads or processes. you just specify a thread or process pool with a certain number of “workers,” submit
      jobs, and collate the results. It’s new in Python 3.2, but a port for Python 2.6+ is available at http://code.google.com/p/pythonfutures.

You can use the SSDictCursor() of MySQLdb and do a fetchone().This is a streaming cursor and you can run this in an infinite while() loop to resemble a queue:

cur = MySQLdb.cursors.SSDictCursor()

cur.execute(query)

while True:

row = cursor.fetchone()

if not row : break # (or sleep()!)

else: # other
  1. Having said all that, I would suggest you look at implementing tools like celery or mongodb to emulate queues and workers. Relational databases are just not cut out for that kind of a job and suffer unnecessary fragmentation. Here's a great source if you want to know more about fragmentation in mysql.
share|improve this answer
    
gevent doesn't allow you to bypass the GIL, and neither does concurrent.futures.ThreadPoolExecutor. Only multiprocessing and concurrent.futures.ProcessPoolExecutor do that. Note that threads and gevent are still useful for I/O bound operations, because doing I/O releases the GIL. –  dano Oct 1 at 14:50
    
Thats correct. Its only by spawning multiple processes that will we be able to achieve true concurrency here. However, many times, what we are trying to achieve by concurrency is actually asynchronous processing ,and gevent fits in beautifully there. –  Abhishek Pathak Oct 1 at 15:25

I am not sure if its the best solution but I think of a structure of a main-thread which reads the db and fill the Queue. Make sure to avoid doublets. Maybe by using primary key of increasing numbers would be easy to check.

The Worker-Structure is nice, but like mentioned in comments: the GIL will avoid any boost. But you could use multiprocessing if your "do_stuff" is independent from the script himself (f.e. the tasks are pictures and the "do_stuff" is "rotate ervery picture 90°"). Afaik it doesn't suffer from GIL

https://docs.python.org/2/library/subprocess.html get you some informations about that.

PS: English isn't my native language.

share|improve this answer
    
I think the multiprocessing module will be more relevant than subprocess in this case. –  Abhishek Pathak Sep 30 at 19:38

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.