I want to extract blobs from rows, inside individual files - and am using Python code that looks like this:
connDB = psycopg2.connect(
database=strDbName, user=strUser, password=strPass,
host=strHost, port=strPort)
cur = connDB.cursor()
cmd = """
select id || '.pdf' as filename, blobColumn as binaryData
from someTable
"""
cur.execute(cmd)
while True:
row = cur.fetchone()
if row is None: break
print "Generating:", row[0]
open(row[0], 'w').write(row[1][:])
This works fine, but when the table contains more than e.g. 500 rows, I start noticing a delay when starting the script - as if the database is "taking a picture" up-front, and then "feeding" me this picture. The end result is that there's no output for some time, and then the blobs start pouring out.
If my hunch is correct:
- I'd very much like to avoid this "picture" - because blobs are big, and so will the "picture", straining the DB and delaying my output for no reason.
- An alternative I am contemplating is reading all IDs of the rows up-front, adding them into a Python array, and then looping over that array, executing one
select blobColumn where id=...
per iteration. Shouldn't this be better?
Thanks for any hints - and even though this is about Python, PostgreSQL and psycopg2, I'd be interested to hear whether this is DB-engine specific, whether I am missing some better use of the psycopg2 API, etc.