This is a Python programming tutorial for the PostgreSQL database. It covers the basics of PostgreSQL programming with the Python language. You might also want to check the Python tutorial or PostgreSQL PHP tutorial on ZetCode.
Several libraries exist for connecting to the PostgreSQL database from the Python language. In this tutorial we will use the psycopg2 module. It is a PostgreSQL database adapter for the Python programming language. According to the module documentation it is currently the most popular Python module for the PostgreSQL database. It is mostly implemented in C as a libpq wrapper.
PostgreSQL is a powerful, open source object-relational database system. It is a multi-user, multi-threaded database management system. It runs on multiple platforms including Linux, FreeBSD, Solaris, Microsoft Windows and Mac OS X. PostgreSQL is developed by the PostgreSQL Global Development Group.
To work with this tutorial, we must have Python language, PostgreSQL database and psycopg2 language binding installed on our system.
$ sudo apt-get install postgresql
On an Ubuntu based system we can install the PostgreSQL database using the above command.
$ sudo update-rc.d -f postgresql remove Removing any system startup links for /etc/init.d/postgresql ... /etc/rc0.d/K21postgresql /etc/rc1.d/K21postgresql /etc/rc2.d/S19postgresql /etc/rc3.d/S19postgresql /etc/rc4.d/S19postgresql /etc/rc5.d/S19postgresql /etc/rc6.d/K21postgresql
If we install the PostgreSQL database from packages, it is automatically added to the start up scripts of the operating system. If we are only learning to work with the database, it is unnecessary to start the database each time we boot the system. The above command removes any system startup links for the PostgreSQL database.
$ /etc/init.d/postgresql status Running clusters: 9.1/main $ service postgresql status Running clusters: 9.1/main
We check if the PostgreSQL server is running. If not, we need to start the server.
$ sudo service postgresql start * Starting PostgreSQL 9.1 database server [ OK ]
On Ubuntu Linux we can start the server with the service postgresql start command.
$ sudo service postgresql stop [sudo] password for janbodnar: * Stopping PostgreSQL 9.1 database server [ OK ]
We use the service postgresql stop command to stop the PostgreSQL server.
$ sudo apt-get install python-psycopg2
Here we install the psycopg2 module on a Ubuntu system.
$ sudo -u postgres createuser janbodnar Shall the new role be a superuser? (y/n) n Shall the new role be allowed to create databases? (y/n) y Shall the new role be allowed to create more new roles? (y/n) n
We create a new role in the PostgreSQL system. We allow it to have ability to create new databases. A role is a user in a database world. Roles are separate from operating system users. We have created a new user without the -W option, e.g. we have not specified a password. This enables us to connect to a database with this user without password authentication. Note that this works only on localhost.
$ sudo -u postgres createdb testdb -O janbodnar
The createdb
command creates a new PostgreSQL database with the
owner janbodnar.
In the first code example, we will get the version of the PostgreSQL database.
#!/usr/bin/python # -*- coding: utf-8 -*- import psycopg2 import sys con = None try: con = psycopg2.connect(database='testdb', user='janbodnar') cur = con.cursor() cur.execute('SELECT version()') ver = cur.fetchone() print ver except psycopg2.DatabaseError, e: print 'Error %s' % e sys.exit(1) finally: if con: con.close()
In the above Python script we connect to the previously created testdb database. We execute an SQL statement which returns the version of the PostgreSQL database.
import psycopg2
The psycopg2
is a Python module which is used to work
with the PostgreSQL database.
con = None
We initialize the con variable to None. In case we could not create a connection to the database (for example the disk is full), we would not have a connection variable defined. This would lead to an error in the finally clause.
con = psycopg2.connect(database='testdb', user='janbodnar')
The connect()
method creates a new database session and
returns a connection object. The user was created without a password.
On localhost, we can omit the password option. Otherwise, it must
be specified.
cur = con.cursor() cur.execute('SELECT version()')
From the connection, we get the cursor object. The cursor is used to traverse
the records from the result set. We call the execute()
method of the cursor
and execute the SQL statement.
er = cur.fetchone()
We fetch the data. Since we retrieve only one record, we call the
fetchone()
method.
print ver
We print the data that we have retrieved to the console.
except psycopg2.DatabaseError, e: print 'Error %s' % e sys.exit(1)
In case of an exception, we print an error message and exit the script with an error code 1.
finally: if con: con.close())
In the final step, we release the resources.
$ ./version.py version --------------------------------------------------- --------------------------------------------------------- PostgreSQL 9.1.2 on i686-pc-linux-gnu, compiled by gcc-4.6.real (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1, 32-bit (1 row)
Running the version.py script.
We will create a Cars table and insert several rows to it.
#!/usr/bin/python # -*- coding: utf-8 -*- import psycopg2 import sys con = None try: con = psycopg2.connect(database='testdb', user='janbodnar') cur = con.cursor() cur.execute("CREATE TABLE cars(id INT PRIMARY KEY, name VARCHAR(20), price INT)") cur.execute("INSERT INTO cars VALUES(1,'Audi',52642)") cur.execute("INSERT INTO cars VALUES(2,'Mercedes',57127)") cur.execute("INSERT INTO cars VALUES(3,'Skoda',9000)") cur.execute("INSERT INTO cars VALUES(4,'Volvo',29000)") cur.execute("INSERT INTO cars VALUES(5,'Bentley',350000)") cur.execute("INSERT INTO cars VALUES(6,'Citroen',21000)") cur.execute("INSERT INTO cars VALUES(7,'Hummer',41400)") cur.execute("INSERT INTO cars VALUES(8,'Volkswagen',21600)") con.commit() except psycopg2.DatabaseError, e: if con: con.rollback() print 'Error %s' % e sys.exit(1) finally: if con: con.close()
The above script creates a Cars table and inserts 8 rows into the table.
cur.execute("CREATE TABLE cars(id INT PRIMARY KEY, name VARCHAR(20), price INT)")
This SQL statement creates a new cars table. The table has three columns.
cur.execute("INSERT INTO Cars VALUES(1,'Audi',52642)") cur.execute("INSERT INTO Cars VALUES(2,'Mercedes',57127)")
These two lines insert two cars into the table.
con.commit()
The changes are committed to the database.
if con: con.rollback()
In case of an error, we roll back any possible changes to our database table.
$ psql testdb psql (9.1.2) Type "help" for help. testdb=> SELECT * FROM cars; id | name | price ----+------------+-------- 1 | Audi | 52642 2 | Mercedes | 57127 3 | Skoda | 9000 4 | Volvo | 29000 5 | Bentley | 350000 6 | Citroen | 21000 7 | Hummer | 41400 8 | Volkswagen | 21600 (8 rows)
We verify the written data with the psql tool.
We are going to create the same table. This time using the convenience
executemany()
method.
#!/usr/bin/python # -*- coding: utf-8 -*- import psycopg2 import sys cars = ( (1, 'Audi', 52642), (2, 'Mercedes', 57127), (3, 'Skoda', 9000), (4, 'Volvo', 29000), (5, 'Bentley', 350000), (6, 'Citroen', 21000), (7, 'Hummer', 41400), (8, 'Volkswagen', 21600) ) con = None try: con = psycopg2.connect(database='testdb', user='janbodnar') cur = con.cursor() cur.execute("DROP TABLE IF EXISTS cars") cur.execute("CREATE TABLE cars(id INT PRIMARY KEY, name TEXT, price INT)") query = "INSERT INTO cars (id, name, price) VALUES (%s, %s, %s)" cur.executemany(query, cars) con.commit() except psycopg2.DatabaseError, e: if con: con.rollback() print 'Error %s' % e sys.exit(1) finally: if con: con.close()
This script drops a Cars table if it exists and (re)creates it.
cur.execute("DROP TABLE IF EXISTS cars") cur.execute("CREATE TABLE cars(id INT PRIMARY KEY, name TEXT, price INT)")
The first SQL statement drops the Cars table, if it exists. The second SQL statement creates the Cars table.
query = "INSERT INTO cars (id, name, price) VALUES (%s, %s, %s)"
This is the query that we will use.
cur.executemany(query, cars)
We insert 8 rows into the table using the convenience executemany()
method.
The first parameter of this method is a parameterized SQL statement. The second
parameter is the data, in the form of tuple of tuples.
Now, that we have inserted some data into the database, we want to get it back.
#!/usr/bin/python # -*- coding: utf-8 -*- import psycopg2 import sys con = None try: con = psycopg2.connect(database='testdb', user='janbodnar') cur = con.cursor() cur.execute("SELECT * FROM cars") rows = cur.fetchall() for row in rows: print row except psycopg2.DatabaseError, e: print 'Error %s' % e sys.exit(1) finally: if con: con.close()
In this example, we retrieve all data from the cars table.
cur.execute("SELECT * FROM Cars")
This SQL statement selects all data from the Cars table.
rows = cur.fetchall()
The fetchall()
method gets all records. It returns
a result set. Technically, it is a tuple of tuples. Each of the inner tuples
represent a row in the table.
for row in rows: print row
We print the data to the console, row by row.
$ ./fetch1.py (1, 'Audi', 52642) (2, 'Mercedes', 57127) (3, 'Skoda', 9000) (4, 'Volvo', 29000) (5, 'Bentley', 350000) (6, 'Citroen', 21000) (7, 'Hummer', 41400) (8, 'Volkswagen', 21600)
This is the output of the example.
Returning all data at a time may not be feasible. We can fetch rows one by one.
#!/usr/bin/python # -*- coding: utf-8 -*- import psycopg2 import sys con = None try: con = psycopg2.connect(database='testdb', user='janbodnar') cur = con.cursor() cur.execute("SELECT * FROM cars") while True: row = cur.fetchone() if row == None: break print row[0], row[1], row[2] except psycopg2.DatabaseError, e: print 'Error %s' % e sys.exit(1) finally: if con: con.close()
In this script we connect to the database and fetch the rows of the cars table one by one.
while True:
We access the data from the while loop. When we read the last row, the loop is terminated.
row = cur.fetchone() if row == None: break
The fetchone()
method returns the next row from
the table. If there is no more data left, it returns None
.
In this case we break the loop.
print row[0], row[1], row[2]
The data is returned in the form of a tuple. Here we select records from the tuple. The first is the Id, the second is the car name and the third is the price of the car.
$ ./retrieveonebyone.py 1 Audi 52642 2 Mercedes 57127 3 Skoda 9000 4 Volvo 29000 5 Bentley 350000 6 Citroen 21000 7 Hummer 41400 8 Volkswagen 21600
This is the output of the example.
The default cursor returns the data in a tuple of tuples. When we use a dictionary cursor, the data is sent in a form of Python dictionaries. This way we can refer to the data by their column names.
#!/usr/bin/python # -*- coding: utf-8 -*- import psycopg2 import psycopg2.extras import sys con = None try: con = psycopg2.connect(database='testdb', user='janbodnar') cursor = con.cursor(cursor_factory=psycopg2.extras.DictCursor) cursor.execute("SELECT * FROM Cars") rows = cursor.fetchall() for row in rows: print "%s %s %s" % (row["id"], row["name"], row["price"]) except psycopg2.DatabaseError, e: print 'Error %s' % e sys.exit(1) finally: if con: con.close()
In this example, we print the contents of the cars table using the dictionary cursor.
import psycopg2.extras
The dictionary cursor is located in the extras module.
cursor = con.cursor(cursor_factory=psycopg2.extras.DictCursor)
We create a DictCursor
.
for row in rows: print "%s %s %s" % (row["id"], row["name"], row["price"])
The data is accessed by the column names.
Now we will concern ourselves with parameterized queries. When we use parameterized queries, we use placeholders instead of directly writing the values into the statements. Parameterized queries increase security and performance.
The Python psycopg2 module supports two types of placeholders. Ansi C printf format and Python extended format.
#!/usr/bin/python # -*- coding: utf-8 -*- import psycopg2 import sys con = None uId = 1 uPrice = 62300 try: con = psycopg2.connect(database='testdb', user='janbodnar') cur = con.cursor() cur.execute("UPDATE Cars SET price=%s WHERE id=%s", (uPrice, uId)) con.commit() print "Number of rows updated: %d" % cur.rowcount except psycopg2.DatabaseError, e: if con: con.rollback() print 'Error %s' % e sys.exit(1) finally: if con: con.close()
We update a price of one car. In this code example, we use the question mark placeholders.
cur.execute("UPDATE Cars SET price=%s WHERE id=%s", (uPrice, uId))
The characters (%s) are placeholders for values. The values are added to the placeholders.
print "Number of rows updated: %d" % cur.rowcount
The rowcount
property returns the number of updated
rows. In our case one row was updated.
$ ./prepared.py Number of rows updated: 1 testdb=> SELECT * FROM cars WHERE id=1; id | name | price ----+------+------- 1 | Audi | 62300 (1 row)
The price of the car was updated. We check the change with the psql tool.
The second example uses parameterized statements with Python extended format.
#!/usr/bin/python # -*- coding: utf-8 -*- import psycopg2 import sys con = None uid = 3 try: con = psycopg2.connect(database='testdb', user='janbodnar') cur = con.cursor() cur.execute("SELECT * FROM cars WHERE id=%(id)s", {'id': uid } ) print cur.fetchone() except psycopg2.DatabaseError, e: print 'Error %s' % e sys.exit(1) finally: if con: con.close()
We select a name and a price of a car using pyformat parameterized statement.
cur.execute("SELECT * FROM cars WHERE id=%(id)s", {'id': uid } )
The named placeholders start with a colon character.
$ ./parameterized2.py (3, 'Skoda', 9000)
Output of the example.
In this section, we are going to insert an image to the PostgreSQL database. Note that some people argue against putting images into databases. Here we only show how to do it. We do not dwell into technical issues of whether to save images in databases or not.
testdb=> CREATE TABLE images(id INT PRIMARY KEY, data BYTEA);
For this example, we create a new table called images. For the images, we use
the BYTEA
data type. It allows to store binary strings.
#!/usr/bin/python # -*- coding: utf-8 -*- import psycopg2 import sys def readImage(): try: fin = open("woman.jpg", "rb") img = fin.read() return img except IOError, e: print "Error %d: %s" % (e.args[0],e.args[1]) sys.exit(1) finally: if fin: fin.close() try: con = psycopg2.connect(database="testdb", user="janbodnar") cur = con.cursor() data = readImage() binary = psycopg2.Binary(data) cur.execute("INSERT INTO images(id, data) VALUES (1, %s)", (binary,) ) con.commit() except psycopg2.DatabaseError, e: if con: con.rollback() print 'Error %s' % e sys.exit(1) finally: if con: con.close()
In this script, we read an image from the current working directory and write it into the images table of the PostgreSQL testdb database.
try: fin = open("woman.jpg", "rb") img = fin.read() return img
We read binary data from the filesystem. We have a jpg image called woman.jpg.
binary = psycopg2.Binary(data)
The data is encoded using the psycopg2 Binary
object.
cur.execute("INSERT INTO images(id, data) VALUES (1, %s)", (binary,) )
This SQL statement is used to insert the image into the database.
In this section, we are going to perform the reverse operation. We will read an image from the database table.
#!/usr/bin/python # -*- coding: utf-8 -*- import psycopg2 import sys def writeImage(data): try: fout = open('woman2.jpg','wb') fout.write(data) except IOError, e: print "Error %d: %s" % (e.args[0], e.args[1]) sys.exit(1) finally: if fout: fout.close() try: con = psycopg2.connect(database="testdb", user="janbodnar") cur = con.cursor() cur.execute("SELECT Data FROM images LIMIT 1") data = cur.fetchone()[0] writeImage(data) con.commit() except psycopg2.DatabaseError, e: if con: con.rollback() print 'Error %s' % e sys.exit(1) finally: if con: con.close()
We read image data from the images table and write it to another file, which we call woman2.jpg.
try: fout = open('woman2.jpg','wb') fout.write(data)
We open a binary file in a writing mode. The data from the database is written to the file.
cur.execute("SELECT Data FROM Images LIMIT 1") data = cur.fetchone()[0]
These two lines select and fetch data from the Images table. We obtain the binary data from the first row.
Metadata is information about the data in the database. Metadata in a PostgreSQL database contains information about the tables and columns, in which we store data. Number of rows affected by an SQL statement is a metadata. Number of rows and columns returned in a result set belong to metadata as well.
Metadata in PostgreSQL can be obtained using from the description property of the cursor object or from the information_schema table.
Next we will print all rows from the cars table with their column names.
#!/usr/bin/python # -*- coding: utf-8 -*- import psycopg2 import sys con = None try: con = psycopg2.connect("dbname='testdb' user='janbodnar'") cur = con.cursor() cur.execute('SELECT * FROM cars') col_names = [cn[0] for cn in cur.description] rows = cur.fetchall() print "%s %-10s %s" % (col_names[0], col_names[1], col_names[2]) for row in rows: print "%2s %-10s %s" % row except psycopg2.DatabaseError, e: print 'Error %s' % e sys.exit(1) finally: if con: con.close()
We print the contents of the cars table to the console. Now, we include the names of the columns too. The records are aligned with the column names.
col_names = [cn[0] for cn in cur.description]
We get the column names from the description
property
of the cursor object.
print "%s %-10s %s" % (col_names[0], col_names[1], col_names[2])
This line prints three column names of the cars table.
for row in rows: print "%2s %-10s %s" % row
We print the rows using the for loop. The data is aligned with the column names.
$ ./colnames.py id name price 2 Mercedes 57127 3 Skoda 9000 4 Volvo 29000 5 Bentley 350000 6 Citroen 21000 7 Hummer 41400 8 Volkswagen 21600 1 Audi 62300
Output.
In the following example we will list all tables in the testdb database.
#!/usr/bin/python # -*- coding: utf-8 -*- import psycopg2 import sys con = None try: con = psycopg2.connect(database='testdb', user='janbodnar') cur = con.cursor() cur.execute("""SELECT table_name FROM information_schema.tables WHERE table_schema = 'public'""") rows = cur.fetchall() for row in rows: print row[0] except psycopg2.DatabaseError, e: print 'Error %s' % e sys.exit(1) finally: if con: con.close()
The code example prints all available tables in the current database to the terminal.
cur.execute("""SELECT table_name FROM information_schema.tables WHERE table_schema = 'public'""")
The table names are stored inside the system information_schema
table.
$ ./list_tables.py cars images friends
These were the tables on our system.
We can export and import data using copy_to()
and copy_from()
methods.
#!/usr/bin/python # -*- coding: utf-8 -*- import psycopg2 import sys con = None fout = None try: con = psycopg2.connect(database='testdb', user='janbodnar') cur = con.cursor() fout = open('cars.sql','w') cur.copy_to(fout, 'cars', sep="|") except psycopg2.DatabaseError, e: print 'Error %s' % e sys.exit(1) except IOError, e: print 'Error %s' % e sys.exit(1) finally: if con: con.close() if fout: fout.close()
In the above example, we copy the data from the cars table into the cars.sql file.
fout = open('cars.sql','w')
We open a file where we will write the data from the cars table.
cur.copy_to(fout, 'cars', sep="|")
The copy_to
method copies data from the cars table
to the opened file. The columns are separated with a | character.
$ cat cars.sql 2|Mercedes|57127 3|Skoda|9000 4|Volvo|29000 5|Bentley|350000 6|Citroen|21000 7|Hummer|41400 8|Volkswagen|21600 1|Audi|62300
The contents of the cars.sql file.
Now we are going to perform a reverse operation. We will import the dumped table back into the database table.
testdb=> DELETE FROM cars; DELETE 8
We delete the data from the cars table.
#!/usr/bin/python # -*- coding: utf-8 -*- import psycopg2 import sys con = None f = None try: con = psycopg2.connect(database='testdb', user='janbodnar') cur = con.cursor() f = open('cars', 'r') cur.copy_from(f, 'cars', sep="|") con.commit() except psycopg2.DatabaseError, e: if con: con.rollback() print 'Error %s' % e sys.exit(1) except IOError, e: if con: con.rollback() print 'Error %s' % e sys.exit(1) finally: if con: con.close() if f: f.close()
In this script, we read the contents of the cars file and copy it back to the cars table.
f = open('cars', 'r') cur.copy_from(f, 'cars', sep="|") con.commit()
We open the cars file for reading and copy the contents to the cars table. The changes are committed.
testdb=> SELECT * FROM cars; id | name | price ----+------------+-------- 1 | Audi | 52642 2 | Mercedes | 57127 3 | Skoda | 9000 4 | Volvo | 29000 5 | Bentley | 350000 6 | Citroen | 21000 7 | Hummer | 41400 8 | Volkswagen | 21600 (8 rows)
The output shows, that we have successfully recreated the saved cars table.
A transaction is an atomic unit of database operations against the data in one or more databases. The effects of all the SQL statements in a transaction can be either all committed to the database or all rolled back.
In psycopg2 module transactions are handled by the connection class.
The first command of a connection cursor starts a transaction. (We do not need to enclose
our SQL commands by BEGIN and END statements to create a transaction.
This is handled automatically by psycopg2.) The
following commands are executed in the context of this new
transaction. In case of an error, the transaction is aborted and
no further commands are executed until the rollback()
method.
The documentation to the psycopg2 module says that the connection is responsible
to terminate its transaction, calling either the
commit()
or rollback()
method. Committed changes are immediately made persistent
into the database. Closing the connection using the close()
method or destroying
the connection object (using del or letting it fall out of scope) will
result in an implicit rollback()
call.
The psycopg2 module also supports an autocommit mode, where all changes to the tables are immediately effective. To run in autocommit mode, we set the autocommit property of the connection object to True.
#!/usr/bin/python # -*- coding: utf-8 -*- import psycopg2 import sys con = None try: con = psycopg2.connect(database='testdb', user='janbodnar') print con.autocommit cur = con.cursor() cur.execute("DROP TABLE IF EXISTS friends") cur.execute("CREATE TABLE friends(id serial PRIMARY KEY, name VARCHAR(10))") cur.execute("INSERT INTO friends(name) VALUES ('Tom')") cur.execute("INSERT INTO friends(name) VALUES ('Rebecca')") cur.execute("INSERT INTO friends(name) VALUES ('Jim')") cur.execute("INSERT INTO friends(name) VALUES ('Robert')") #con.commit() except psycopg2.DatabaseError, e: if con: con.rollback() print 'Error %s' % e sys.exit(1) finally: if con: con.close()
We create a friends table and try to fill it with data. However, as we will see, the data will be not committed.
#con.commit()
The commit()
method is commented. If we uncomment the
line, the data will be written to the table.
finally: if con: con.close()
The finally block is always executed. If we have not committed the
changes and no error occures (which would roll back the changes)
the transaction is still opened. The connection is closed with the
close()
method and the transaction is
terminated with an implicit call to the rollback()
method.
testdb=> \dt List of relations Schema | Name | Type | Owner --------+---------+-------+----------- public | cars | table | janbodnar public | friends | table | janbodnar public | images | table | janbodnar (3 rows)
Only after we have uncommented the line, the friends table is created.
In the autocommit mode, an SQL statement is executed immediately.
#!/usr/bin/python # -*- coding: utf-8 -*- import psycopg2 import sys con = None try: con = psycopg2.connect(database='testdb', user='janbodnar') con.autocommit = True cur = con.cursor() cur.execute("DROP TABLE IF EXISTS friends") cur.execute("CREATE TABLE friends(id serial PRIMARY KEY, name VARCHAR(10))") cur.execute("INSERT INTO friends(name) VALUES ('Jane')") cur.execute("INSERT INTO friends(name) VALUES ('Tom')") cur.execute("INSERT INTO friends(name) VALUES ('Rebecca')") cur.execute("INSERT INTO friends(name) VALUES ('Jim')") cur.execute("INSERT INTO friends(name) VALUES ('Robert')") cur.execute("INSERT INTO friends(name) VALUES ('Patrick')") except psycopg2.DatabaseError, e: print 'Error %s' % e sys.exit(1) finally: if con: con.close()
In this example, we connect to the database in the autocommit mode.
We don't call neither commit()
nor rollback()
methods.
con.autocommit = True
We set the connection to the autocommit mode.
$ ./autocommit.py testdb=> SELECT * FROM friends; id | name ----+--------- 1 | Jane 2 | Tom 3 | Rebecca 4 | Jim 5 | Robert 6 | Patrick (6 rows)
The data was successfully committed to the friends table.
This was the PostgreSQL Python tutorial.