2

I have a dump that was made from a PostgreSQL database. I want to check for some information in that dump, specifically checking if there are entries in a certain table with certain values in certain fields.

This is for a Python program that should run automatically on many different inputs on customer machines, so I need a programmatic solution, not manually opening the file and looking for where that table is defined. I could restore the dump to a database and then delete it, but I'm worrying that this operation is heavy or that it has side-effects. I want there to be no side-effects to my query, I just want to do the check without it affecting anything in my system.

Is that possible in any way? Preferably in Python?

2
  • 1
    If each table is it's own dump file, you can use grep, awk or regular expressions insde python otherwise there's no easy way Commented Jun 5, 2016 at 4:35
  • 1) there are at least three different formats for pg_dump. 2) pg_restore knows about these. 3) you could try to emulate pg_restore... | psql ..., or use pg_restore ...|grep ... for simple cases. 4) or just restore into a (temporary) database and work from there. Commented Jun 5, 2016 at 11:12

3 Answers 3

3

Any dump format: restore and query

The most practical thing to do is restore them to a temporary PostgreSQL database then query the database. It's by far the simplest option. If you have a non-superuser with createdb rights you can do this pretty trivially and safely with pg_restore.

SQL-format

If it's a plaintext (.sql) format dump, if desperate and you know the dumps were not created with the --inserts or --column-inserts options and you don't use the same table name in multiple schemas, you could just search for the text

COPY tablename (

at the start of a line, then read the COPY-format data (see below) until you find \. at the start of a line.

If you do use the same table name in different schemas you have to parse the dump to find the SET search_path entry for the schema you want, then start looking for the desired table COPY statement.

Custom-format

However, if the dump is in the PostgreSQL custom format, which you should always prefer and request by using -Fc with pg_dump, it is IIRC really a tar file with a custom header. You can either seek within it to find the tar header then extract it, or you can use pg_restore to list its header and then extract individual tables.

For this task I'd do the latter. To list tables in the dump:

pg_restore --list out.dump

To dump a particular table as tab-separated COPY format by qualified name, e.g. table address in schema public:

pg_restore -n public -t address out.dump

The output has a bunch of stuff you can't get pg_restore to skip at the start, but your script can just look for the word COPY (uppercase) at the start of a line and start reading on the next line, until it reaches a \. at the end of a line. For details on the format see the PostgreSQL manual on COPY

Of course you need the pg_restore binary for this.


Make sure there is no PGDATABASE environment variable set when you invoke pg_restore. Otherwise it'll restore to a DB instead of printing output to stdout.

Sign up to request clarification or add additional context in comments.

Comments

2

Dump the database to a CSV file (or a CSV file for each table) and then you can load and query them using pandas.

Comments

0

You could convert your dump to INSERT INTO dump with this little tool I've written :

https://github.com/freddez/pg-dump2insert

It will be easier to grep specific table data in this form.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.