Extracting blobs from PostgreSQL using Python - “pause” at the beginning

Question

I want to extract blobs from rows, inside individual files - and am using Python code that looks like this:

connDB = psycopg2.connect(
    database=strDbName, user=strUser, password=strPass,
    host=strHost, port=strPort)
cur = connDB.cursor()
cmd = """
select id || '.pdf' as filename, blobColumn as binaryData
from someTable
"""
cur.execute(cmd)
while True:
    row = cur.fetchone()
    if row is None: break
    print "Generating:", row[0]
    open(row[0], 'w').write(row[1][:])

This works fine, but when the table contains more than e.g. 500 rows, I start noticing a delay when starting the script - as if the database is "taking a picture" up-front, and then "feeding" me this picture. The end result is that there's no output for some time, and then the blobs start pouring out.

If my hunch is correct:

I'd very much like to avoid this "picture" - because blobs are big, and so will the "picture", straining the DB and delaying my output for no reason.
An alternative I am contemplating is reading all IDs of the rows up-front, adding them into a Python array, and then looping over that array, executing one select blobColumn where id=... per iteration. Shouldn't this be better?

Thanks for any hints - and even though this is about Python, PostgreSQL and psycopg2, I'd be interested to hear whether this is DB-engine specific, whether I am missing some better use of the psycopg2 API, etc.

Ken Hampson · Accepted Answer · 2014-06-18 23:41:55Z

up vote 1 down vote accepted

Try using a named cursor. It defaults in psycopg2 to 2000, but since you have noticed a threshold around 500, try setting it much lower than that, say 100 or 200. You do this simply by providing a name parameter to the cursor constructor.

The named cursor will allow psycopg2 to automatically query in batches by setting up a server-side cursor for you. On the Python side, you simply iterate as usual, and it will fetch more from the server when necessary.

Also, it may be instructive to look at the query plan for the larger-sized tables to see how they differ from the smaller ones.

If you use the named cursor, after you run the execute, you simply iterate instead of calling fetchone or similar.

i.e.

for row in cur:
    # do stuff here

answered Jun 18 at 23:41

Ken Hampson
2,8482314

Perfect suggestion, worked like a charm. All I had to do was switch to "cur = connDB.cursor(name='test')" and then "cur.itersize=10" - the blobs then started appearing instantly. Thanks! – ttsiodras Jun 19 at 7:20

Sure, no problem. – Ken Hampson Jun 19 at 19:20

add a comment |

asked	2 months ago
viewed	33 times
active	2 months ago

current community

your communities

more stack exchange communities

Extracting blobs from PostgreSQL using Python - “pause” at the beginning

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged python postgresql blob psycopg2 or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Extracting blobs from PostgreSQL using Python - “pause” at the beginning

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python postgresql blob psycopg2 or ask your own question.

Related

Hot Network Questions