0

I want to extract blobs from rows, inside individual files - and am using Python code that looks like this:

connDB = psycopg2.connect(
    database=strDbName, user=strUser, password=strPass,
    host=strHost, port=strPort)
cur = connDB.cursor()
cmd = """
select id || '.pdf' as filename, blobColumn as binaryData
from someTable
"""
cur.execute(cmd)
while True:
    row = cur.fetchone()
    if row is None: break
    print "Generating:", row[0]
    open(row[0], 'w').write(row[1][:])

This works fine, but when the table contains more than e.g. 500 rows, I start noticing a delay when starting the script - as if the database is "taking a picture" up-front, and then "feeding" me this picture. The end result is that there's no output for some time, and then the blobs start pouring out.

If my hunch is correct:

  • I'd very much like to avoid this "picture" - because blobs are big, and so will the "picture", straining the DB and delaying my output for no reason.
  • An alternative I am contemplating is reading all IDs of the rows up-front, adding them into a Python array, and then looping over that array, executing one select blobColumn where id=... per iteration. Shouldn't this be better?

Thanks for any hints - and even though this is about Python, PostgreSQL and psycopg2, I'd be interested to hear whether this is DB-engine specific, whether I am missing some better use of the psycopg2 API, etc.

1 Answer 1

1

Try using a named cursor. It defaults in psycopg2 to 2000, but since you have noticed a threshold around 500, try setting it much lower than that, say 100 or 200. You do this simply by providing a name parameter to the cursor constructor.

The named cursor will allow psycopg2 to automatically query in batches by setting up a server-side cursor for you. On the Python side, you simply iterate as usual, and it will fetch more from the server when necessary.

Also, it may be instructive to look at the query plan for the larger-sized tables to see how they differ from the smaller ones.

If you use the named cursor, after you run the execute, you simply iterate instead of calling fetchone or similar.

i.e.

for row in cur:
    # do stuff here
1
  • Perfect suggestion, worked like a charm. All I had to do was switch to "cur = connDB.cursor(name='test')" and then "cur.itersize=10" - the blobs then started appearing instantly. Thanks! Commented Jun 19, 2014 at 7:20

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.