Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I want to extract blobs from rows, inside individual files - and am using Python code that looks like this:

connDB = psycopg2.connect(
    database=strDbName, user=strUser, password=strPass,
    host=strHost, port=strPort)
cur = connDB.cursor()
cmd = """
select id || '.pdf' as filename, blobColumn as binaryData
from someTable
"""
cur.execute(cmd)
while True:
    row = cur.fetchone()
    if row is None: break
    print "Generating:", row[0]
    open(row[0], 'w').write(row[1][:])

This works fine, but when the table contains more than e.g. 500 rows, I start noticing a delay when starting the script - as if the database is "taking a picture" up-front, and then "feeding" me this picture. The end result is that there's no output for some time, and then the blobs start pouring out.

If my hunch is correct:

  • I'd very much like to avoid this "picture" - because blobs are big, and so will the "picture", straining the DB and delaying my output for no reason.
  • An alternative I am contemplating is reading all IDs of the rows up-front, adding them into a Python array, and then looping over that array, executing one select blobColumn where id=... per iteration. Shouldn't this be better?

Thanks for any hints - and even though this is about Python, PostgreSQL and psycopg2, I'd be interested to hear whether this is DB-engine specific, whether I am missing some better use of the psycopg2 API, etc.

share|improve this question

1 Answer 1

up vote 1 down vote accepted

Try using a named cursor. It defaults in psycopg2 to 2000, but since you have noticed a threshold around 500, try setting it much lower than that, say 100 or 200. You do this simply by providing a name parameter to the cursor constructor.

The named cursor will allow psycopg2 to automatically query in batches by setting up a server-side cursor for you. On the Python side, you simply iterate as usual, and it will fetch more from the server when necessary.

Also, it may be instructive to look at the query plan for the larger-sized tables to see how they differ from the smaller ones.

If you use the named cursor, after you run the execute, you simply iterate instead of calling fetchone or similar.

i.e.

for row in cur:
    # do stuff here
share|improve this answer
    
Perfect suggestion, worked like a charm. All I had to do was switch to "cur = connDB.cursor(name='test')" and then "cur.itersize=10" - the blobs then started appearing instantly. Thanks! –  ttsiodras Jun 19 at 7:20
    
Sure, no problem. –  Ken Hampson Jun 19 at 19:20

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.