Extracting blobs from PostgreSQL using Python - "pause" at the beginning

Question

I want to extract blobs from rows, inside individual files - and am using Python code that looks like this:

connDB = psycopg2.connect(
    database=strDbName, user=strUser, password=strPass,
    host=strHost, port=strPort)
cur = connDB.cursor()
cmd = """
select id || '.pdf' as filename, blobColumn as binaryData
from someTable
"""
cur.execute(cmd)
while True:
    row = cur.fetchone()
    if row is None: break
    print "Generating:", row[0]
    open(row[0], 'w').write(row[1][:])

This works fine, but when the table contains more than e.g. 500 rows, I start noticing a delay when starting the script - as if the database is "taking a picture" up-front, and then "feeding" me this picture. The end result is that there's no output for some time, and then the blobs start pouring out.

If my hunch is correct:

I'd very much like to avoid this "picture" - because blobs are big, and so will the "picture", straining the DB and delaying my output for no reason.
An alternative I am contemplating is reading all IDs of the rows up-front, adding them into a Python array, and then looping over that array, executing one select blobColumn where id=... per iteration. Shouldn't this be better?

Thanks for any hints - and even though this is about Python, PostgreSQL and psycopg2, I'd be interested to hear whether this is DB-engine specific, whether I am missing some better use of the psycopg2 API, etc.

khampson · Accepted Answer · 2014-06-18 23:41:55Z

1

Try using a named cursor. It defaults in psycopg2 to 2000, but since you have noticed a threshold around 500, try setting it much lower than that, say 100 or 200. You do this simply by providing a name parameter to the cursor constructor.

The named cursor will allow psycopg2 to automatically query in batches by setting up a server-side cursor for you. On the Python side, you simply iterate as usual, and it will fetch more from the server when necessary.

Also, it may be instructive to look at the query plan for the larger-sized tables to see how they differ from the smaller ones.

If you use the named cursor, after you run the execute, you simply iterate instead of calling fetchone or similar.

i.e.

for row in cur:
    # do stuff here

answered Jun 18, 2014 at 23:41

khampson

15.5k4 gold badges45 silver badges44 bronze badges

Perfect suggestion, worked like a charm. All I had to do was switch to "cur = connDB.cursor(name='test')" and then "cur.itersize=10" - the blobs then started appearing instantly. Thanks!

ttsiodras
– ttsiodras

06/19/2014 07:20:32
Commented Jun 19, 2014 at 7:20

Add a comment |

Collectives™ on Stack Overflow

Extracting blobs from PostgreSQL using Python - "pause" at the beginning

1 Answer 1

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related