Having worked with PostgreSQL continuously since 2001, I'd like to think there's nothing I don't know about backing up and restoring. However, experience shows that a) it's all too easy to develop "muscle memory" for a certain way of doing things, and b) PostgreSQL has a pesky habit of developing new nifty features which fly under the radar if you're not paying sufficient attention to the release notes, so any opportunity to review things from a fresh perspective is a welcome one.
I ordered the paper version of " PostgreSQL Backup and Restore How-to " from Amazon Japan for a tad under 2,000 yen, which is a little on the expensive side for what was described as a "booklet" (and four times the price of the eBook version), but I have a (meagre) book budget to burn and I have this old-fashioned habit of scribbling notes on margins and making little bookmarks from Post-It notes etc., also it's nice to spend some time not staring at a screen. To my surprise it arrived less than 48 hours after ordering - upon closer examination it turned out the book was actually printed by Amazon in Japan, which is kind of nifty.
First impression: it's a very thin volume - 42 actual content pages - so is genuinely a booklet. On the other hand, bearing in mind I've got a bookshelf full of largely unread weighty tomes, less can be more.
The booklet's table of contents, as lifted directly from the publisher's site, is:
Getting a basic export (Simple) Partial database exports (Simple) Restoring a database export (Simple) Obtaining a binary backup (Simple) Stepping int TAR backups (Intermediate) Taking snapshots (Advanced) Synchronizing backup servers (Intermediate) Restoring a binary backup (Simple) Point in time recovery (Intermediate) Warm and hot standby restore (Intermediate) Streaming replication (Advanced)In this article, we want to find the town with the greatest number of inhabitants near a given location.
We first need to find and import some data, and I found at the following place a CSV listing of french cities with coordinates and population and some numbers of interest for the exercise here.
To import the data set, we first need a table, then a
COPY
command:
CREATE TABLE lion1906 ( insee text, nom text, altitude integer, code_postal text, longitude double precision, latitude double precision, pop99 bigint, surface double precision ); \copy lion1906 from 'villes.csv' with csv header delimiter ';' encoding 'latin1'
With that data in place, we can find the 10 nearest towns of a random choosing of us, let's pick Villeurbanne which is in the region of Lyon.
select code_postal, nom, pop99 from lion1906 order by point(longitude, latitude) <-> (select point(longitude, latitude) from lion1906 where nom = 'Villeurbanne') limit 10; code_postal | nom | pop99 -------------+------------------------+-------- 69100 | Villeurbanne | 124215 69300 | Caluire-et-Cuire | 41233 69120 | Vaulx-en-Velin | 39154 69580 | Sathonay-Camp | 4336 69140 | Rillieux-la-Pape | 28367 69000 | Lyon | 445452 69500 | Bron | 37369 69580 | Sathonay-Village | 1693 01700 | Neyron | 2157 69660 | Collonges-au-Mont-d'Or | 3420 (10 rows)
We find Lyon in our list in there, and we want the query now to return only that one as it has the greatest number of inhabitants in the list:
with neighbours as ( select code_postal, nom, pop99 from lion1906 order by point(longitude, latitude) <-> (select point(longitude, latitude) from lion1906 where nom = 'Villeurbanne') limit 10 ) select * from neighbo
GCC 4.8 was recently released. This is the first GCC release that is written in C++ instead of C. Which got me thinking ...
Would this make sense for PostgreSQL?
I think it's worth a closer look.
Much of GCC's job isn't actually that much different from PostgreSQL. It parses language input, optimizes it, and produces some output. It doesn't have a storage layer, it just produces code that someone else runs. Also note that Clang and LLVM are written in C++. I think it would be fair to say that these folks are pretty well informed about selecting a programming language for their job.
It has become apparent to me that C is approaching a dead end. Microsoft isn't updating their compiler to C99, advising people to move to C++ instead. So as long as PostgreSQL (or any other project, for that matter) wants to support that compiler, they will be stuck on C89 forever. That's a long time. We have been carefully introducing the odd post-C89 feature, guarded by configure checks and #ifdefs, but that will either come to an end, or the range of compilers that actually get the full benefit of the code will become narrower and narrower.
C++ on the other hand is still a vibrant language. New standards come out and get adopted by compiler writers. You know how some people require Java 7 or Python 2.7 or Ruby 1.9 for their code? You wish you could have that sort of problem for your C code! With C++ you reasonably might.
I'm also sensing that at this point there are more C++ programmers than C programmers in the world. So using C++ might help grow the project better. (Under the same theory that supporting Windows natively would attract hordes of Windows programmers to the project, which probably did not happen.)
Moving to C++ wouldn't mean that you'd have to rewrite all your code as classes or that you'd have to enter template hell. You could initially consider a C++ compiler a pickier C compiler, and introduce new language features one by one, as you had done before.
Most things that C++ is picky about are things that a C programme
PostgreSQL 9.3 will be coming out in beta soon and with that, some who want to experiment with both PostGIS and PostgreSQL 9.3 have asked if they can use PostGIS 2.0. The answer is NO. A lot of major changes happened in PostgreSQL 9.3 that required us to patch up upcoming PostGIS 2.1. These changes were not backported to 2.0 and I personally do not plan to back-port them unless lightning strikes, a big wad of cash falls from the sky, or for some reason we can't make the 2.1 cut before 9.3 comes out. So if you are planning to experiment with PostgreSQL 9.3, PLEASE use PostGIS 2.1 development branch. I will try to make sure we release 2.1 before PostgreSQL 9.3 comes out even if I have to resort to hitting some people over the head with a rubber bat :).
If ever in doubt what versions of PostGIS works with what versions of PostgreSQL /GEOS / GDAL, please refer to the matrix that we try to keep up to date. http://trac.osgeo.org/postgis/wiki/UsersWikiPostgreSQLPostGIS.
Now some people might say "Isn't it cruel not to support PostGIS 2.0 for 9.3", and my answer is "it's crueler to". The reason is simple. We have limited bandwidth for testing permutations of things and the more permutations of things we support, the dirtier our code base becomes making it harder to maintain and also the less time we can devote to properly testing each permutation. I'd rather say we don't support something than to do a half-hearted job of supporting all. On a slightly different, but also pragmatic note, package maintainers (except for windows maintainers :)) generally only carry one version of PostGIS per version of PostgreSQL, and I'd rather users getting from packages see our best foot than a two year old aging foot.
Note: that going from PostGIS 2.0 to 2.1 is a soft upgrade so you can install 2.1 on your existing PostgreSQL 9.2 without dump restore and then you should be able to pg_upgrade over to 9.3 if your database is too big to dump restore.
This is not the first time I’ve struggled getting SSL
certificate validation to work,
so I thought this time I better write down how I did to avoid
future time-waste.
For security and convenience reasons, I want to do the signing
of client certificates
on a separate dedicated machine, also known as certificate
authority (CA).
This allows us to grant new clients access without having to
login to the PostgreSQL
server signing certs or modifying pg_hba.conf.
We will create a special database group called
sslcertusers
.
All users in this group will be able to connect provided they have
a
client certificate signed by the CA.
In the example below, replace “trustly” with the name of your company/organisation.
1. CA
The CA should be an offline computer locked in a safe.
1.1. Generate CA private key:
sudo openssl genrsa -des3 -out /etc/ssl/private/trustly-ca.key
2048
sudo chown root:ssl-cert /etc/ssl/private/trustly-ca.key
sudo chmod 640 /etc/ssl/private/trustly-ca.key
1.2. Generate CA public certificate:
sudo openssl req -new -x509 -days 3650 \
-subj '/C=SE/ST=Stockholm/L=Stockholm/O=Trustly/CN=trustly' \
-key /etc/ssl/private/trustly-ca.key \
-out /usr/local/share/ca-certificates/trustly-ca.crt
sudo update-ca-certificates
2. PostgreSQL-server
2.1. Generate PostgreSQL-server private key:
# Remove default snakeoil certs
sudo rm /var/lib/postgresql/9.1/main/server.key
sudo rm /var/lib/postgresql/9.1/main/server.crt
# Enter a passphrase
sudo -u postgres openssl genrsa -des3 -out
/var/lib/postgresql/9.1/main/server.key 2048
# Remove the passphrase
sudo -u postgres openssl rsa -in
/var/lib/postgresql/9.1/main/server.key -out
/var/lib/postgresql/9.1/main/server.key
sudo -u postgres chmod 400
/var/lib/postgresql/9.1/main/server.key
2.2. Request CA to sign PostgreSQL-server key:
sudo -u postgres openssl req -new -nodes -key
/var/lib/postgresql/9.1/main/server.key -days 3650 -out
/tmp/server.csr -subj
'/C=SE/ST=Stockholm/L=Stockholm/O=Trustly/CN=postgres'
2.3. Sign PostgreSQL-server key with CA private key:
sudo openssl req -
A little while ago, I wrote to the PostgreSQL general mailing list that I’d been approached by Packt Publishing to contribute a quick manual on doing PostgreSQL backups: Instant PostgreSQL Backup and Restore How-to. They’re the same guys who published Greg Smith’s PostgreSQL 9.0 High Performance book which everyone seems to swear by.
The goal of the backup book was to distill the PostgreSQL documentation, tools, and Wiki down to a collection of short step-by-step guides much like the O’Reilly nutshell series. A lot of the backup recipes we DBAs trade back and forth as a matter of course may not be so obvious, and I threw a couple tricks in for advanced users, to boot.
Well, here’s your chance to spring a free copy! The marketing folks have given the go-ahead to hold a giveaway, and have set aside four copies for lucky winners. A longer description of the How-to is on their site.
All they want to know is what you, as a prospective reader, find most interesting or potentially useful about the book. My comment section will be open until May 8th for replies along these lines. If you are selected, Packt will email you with information on how to get your free digital copy. If your comment catches our attention, you’re one step closer. If you want a print copy, they’re available from Amazon separately.
So remember:
I look forward to forcing Packt to doing some community service
by handing out free copies of the book, and you should too.
When: 7-9pm Thu May 16, 2013
Where: Iovation
Who: Group
What: Lightning Talks
Recent presentations have sparked several conversations of the “no sh–, there I was” variety, so for May we’ll be doing Lightning Talks. Bring your stories of wizardry, abject failure, and “Hail Mary” passes. We do request that your topic have something to do with Postgres, but that’s the only requirement – you don’t even need to have slides if you don’t want to. If you’ve never given a talk before, a lightning talk is a great low-risk way to try it out!
–
Our meeting will be held at Iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!
Building security will close access to the floor at 7:30.
After-meeting beer location: Huber’s. See you there!
Two people might be simultaneously modifying the same database
function. To protect against this scenario, I created a little tool
to stage the source code of all functions in pg_proc
and the definition of all views. You then deploy your changes, and
afterwards you call fdiff()
to show a diff of the
changes made, by comparing the new content of pg_proc
and the definitions of all views. If the changes are what you
expected, you go ahead and commit.
We’ve been using this methodology every day in both my current company and my previous one, and it has prevented us from doing mistakes.
This is how to use it:
BEGIN;
SELECT fstage();
CREATE OR REPLACE FUNCTION ...
CREATE OR REPLACE VIEW ...
SELECT fdiff();
COMMIT;
Example on the output from fdiff()
:
+-------------------------------+
| Updated or replaced functions |
+-------------------------------+
Schema................: public
Name..................: process_order
Argument data types...: _orderid bigint
Result data type......: boolean
Language..............: plpgsql
Type..................: normal
Volatility............: STABLE
Owner.................: amazon
20 c OR Orders.ShippingDate IS NOT NULL
20 c OR Orders.ShippingDate > now() - interval '2
month'
Backing up every database on every version and each running cluster on a host in a single script. And also encrypting the backupfiles on-the-fly and generating a script for retrieving the encrypted backupfiles from the backup-host... :)
In my previous blogentry I didn't explain any of the prerequisites for using the script. So I'll try to cover these below.
I'm having a postgresql database running in a remote location (backend for some webapps/websites) - and I wanted to expand my backup-script, so that I could download my (encrypted) backupfiles from this location over plain http.
So I have a linuxbox (httpserver) that I use as my backup-host. On this box I create a backupuser:
root@httpserver:~#useradd -m <backupusername>So now I have a user that is locked (no password) with a home in /home/<backupusername>. I want to only be able to logon with a key, so this is ok. But it's probably easier to give the backupuser a temporary password, so that i can use ssh-copy-id to transfer the key. So lets do that:
root@httpserver:~#passwd <backupusername>
Then on my postgresql linuxbox (logged in as postgres), I generate a ssh-key pair (and do this with no password!):
postgres@postgresqlserver:~$ ssh-keygenAnd then transfer this key to the backuphost:
Generating public/private rsa key pair.
Enter file in which to save the key (/var/lib/postgresql/.ssh/id_rsa):
Created directory '/var/lib/postgresql/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /var/lib/postgresql/.ssh/id_rsa.
Your public key has been saved in /var/lib/postgresql/.ssh/id_rsa.pub.
The key fingerprint is:.....
postgres@postgresqlserver:~$ ssh-copy-id <backupusername>@httpserverAnd if I try to log on:
<backupusername>@httpserver's password:
Now try logging into the machine, with "ssh '<backupusername>@httpserver'", and check in:
.ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
postgres@postgresqlserver:~$ ssh <backupusername>@httpserverI should
PostgreSQL has a pretty decent user base in India. The Indian government, a host of companies, and freelancers employ it as a database backend of choice. Pune city in India has a huge concentration of PostgreSQL developers/users with many PostgreSQL affiliated companies having their development centers setup here.
In Happy Valentine PostGIS we demonstrated how to use PostGIS raster to decipher letters from a raster, vectorize them and then reuse this vectorized letters to form new words. Admittedly the letters were a bit grainy since they were vectorizations of low res rasters and I didn't bother smoothing them. Bruce Rindahl offered a script to SVG to PostGIS geometry and using Batik to convert a font file to SVG format and gave me a hi-res converted kankin fontset. I still haven't figured out how his script works.
Bborie Park thought that was all too complicated and thought (as
I have always) that we need an ST_GeomFromSVG
function
for PostGIS of which he is on a mission to create when he's less
busy. He also suggested I wrap my letter writer function as an
extension. Taking all these ideas, I formulated an extension you
install with
CREATE EXTENSION postgis_letters;
postgis_letters (http://www.bostongis.com/postgisstuff/postgis_letters_extension.zip) is an sql / data extension containing mostly data, but as the name suggests relying on PostGIS. The data are geometry vectors of the kankin font. I plan to add in more free fonts later once I figure out how to use Bruce's script or Bborie comes up with a better way and also more positioning logic and handling of spaces. So its a little rough at the moment. The purpose of the extension is so I can write words on my images in reports e.g. state names or overlay labels on geometry features like roads and land. Using the power of both geometry/raster you can have a fully functioning report image writer that would return a fully formed image for use in OpenOffice (or for my ASP.NET web apps Active Reports.NET). This wouldn't rely on any mapping server to draw images (just pure PostGIS/PostgreSQL). Hopefully with new and improved binary features coming in PSQL for (looks like 9.4), outputting these raster images from psql will also be trivial. While on my mission to do something useful, I got distracted by something more entertaining: describing spatial processes with words. He
I just watched a presentation from Mar 11, 2012, where Paul Graham / YCombinator is giving away seven billion dollar startup ideas:
Idea 6 on the list, is basically about the need to invent the sufficiently smart compiler, capable of understanding how to automatically break down a problem (expressed in code) into pieces which can be executed in parallell, in order to utilize GPUs and/or a lot of CPU cores.
Building such a compiler is probably extremely difficult, if at all possible, perhaps mainly due to the complexity possible to express in C, C++, Python, Java or any normal imperative language.
Imagine instead the same problem but for a declarative language like SQL. Would it be equally difficult? Core member Bruce Momjian of the PostgreSQL team have already expressed his ideas on how to tackle the problem.
When thinking about it, the query planner is a bit like a compiler, kind of. It takes the SQL and figures out a smart way to execute it, breaking it down into a lot of sub-tasks, of which some could with benefit be executed in parallel.
So let’s hope the great pg-hackers will pull this off, so we don’t have to pay some future start-up compiler-billionaire license fees to get automatic parallelism.
Continuous Integration (CI) using automated open source tools such as Jenkins and Hudson is getting adoption rapidly. These tools help developers to gain confidence for creating more robust code rapidly by improving testing and QA process. The flexibility of these softwares add other challenges for the DBAs!
One of our client came across challenge to cleanup databases
after X number of days from the Jenkins CI database because each
run create seperate database and database names are not standard
because they are provided by users. If they don’t cleanup old
database, the cluster will have hundreds of databases at the end of
the week. We tried to standardize database names but you can’t
control users to make mistakes or input db names On the other hand, Postgres’s system
catalog view doesn’t provide database creation date. How can
I find out databases older than X days and drop them?
I came across this blog entry that answers my question but I was looking for easier way! I found easier way to get the database creation time with single query! Yay :-) Following is the query that can be used to find the database creation time. The query should return correct created_date as long as you haven’t run pg_upgrade on the data directory. I thought to share here so it will be useful for others!
SELECT datname, (pg_stat_file(‘base/’||oid||’/PG_VERSION’)).modification AS datcreated
FROM pg_database;
postgres=# SELECT datname, (pg_stat_file(‘base/’||oid||’/PG_VERSION’)).modification AS datcreated
postgres-# FROM pg_database;
datname | datcreated
————+————————
template1 | 2013-03-28 16:04:13-04
template0 | 2013-03-28 16:04:14-04
postgres | 2013-03-28 16:04:14-04
rangetypes | 2013-03-28 16:14:42-04
puppet | 2013-03-28 16:23:13-04
omniti | 2013-04-20 10:02:22-04
(6 rows)
Ideally, pg_database system catalog view should
include database_created timestamp ! Hopefully, that day will
come sooner than later
Feel free to comment, if you have any other ideas for getting this details or you see any corner cases with above
There are a lot of tools for PostgreSQL development. I want to propose you one more. Why Lazarus:
Go to Lazarus at SourceForge.
Then at the Downloads Section get proper installer. I used lazarus-1.0.8-fpc-2.6.2-win32.exe and lazarus-1.0.8-fpc-2.6.2-win64.exe in my tests. These are the latest releases at the moment.
Here are the screen shots of the installation process where you
can see what I had chosen:
I have Lazarus installed into C:\lazarus\ and hereafter I will use this path.
Now it’s time for PostgresDAC. At the moment many Laarus IDE functions require the source and warns if the source of a unit is missing. Thus we need PostgresDAC with sources version. Sources are available for licensed users in their profile.
I have extracted PostgresDAC sources to C:\PostgresDAC\.
In the Lazarus IDE choose open package and select
dclPostgresDACL.lpk. This is the design time
package. Then in the Package window choose Use →
Install.
Confirmation about rebuild will appear.
Choose “Yes”. Wait for some time. After successful rebuild Lazarus will reopen itself. Tada! We got it!
Now we may open some demo projects and play some time.
The package is available on The PostgreSQL Experts Inc Github Repository and also on PGXNandrew# create extension unnest_ordinality;
andrew=# select * from unnest_ordinality('{a,b,c,d,e,f,g}'::text[]);
element_number | element
----------------+---------
1 | a
2 | b
3 | c
4 | d
5 | e
6 | f
7 | g
In almost any application it's common to want to aggregate some set of values together, commonly in a comma separated form. Most developers do this by running a query to get much of the raw data, looping over the data and pushing it into a set, appending each new value to the appropriate key. Hopefully, it's not a surprise that there's a much better way to do this with PostgreSQL.
Postgres has a flexible and robust array datatype that comes with a variety of functions. Even without taking advantage of the array datatype in your application, you can still take advantage of some of the functions to get the functionality you need. Lets take a look at an example schema and use case.
Given a project management application, you may have
users
who have projects
that have
tasks
. An example piece of functionality might be to
send an email with a list of all projects that have tasks that are
past their due dates of completion. Your schema might look
something like this:
# \d users
Table "public.users"
Column | Type | Modifiers
------------+-----------------------------+-----------
id | integer | not null
email | character varying(255) |
...
# \d projects
Table "public.projects"
Column | Type | Modifiers
------------+-----------------------------+-----------
id | integer | not null
user_id | integer | not null
name | character varying(255) | not null
...
# \d tasks
Table "public.tasks"
Column | Type | Modifiers
--------------+-----------------------------+-----------
id | integer | not null
project_id | integer | not null
completed_at | timestamp without time zone |
due_at | timestamp without time zone |
...
To get a list of all projects that have tasks that haven't been completed, you would start