PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Twitter
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Ian Barwick on 2013-05-03 at 13:01:00

Having worked with PostgreSQL continuously since 2001, I'd like to think there's nothing I don't know about backing up and restoring. However, experience shows that a) it's all too easy to develop "muscle memory" for a certain way of doing things, and b) PostgreSQL has a pesky habit of developing new nifty features which fly under the radar if you're not paying sufficient attention to the release notes, so any opportunity to review things from a fresh perspective is a welcome one.

I ordered the paper version of " PostgreSQL Backup and Restore How-to " from Amazon Japan for a tad under 2,000 yen, which is a little on the expensive side for what was described as a "booklet" (and four times the price of the eBook version), but I have a (meagre) book budget to burn and I have this old-fashioned habit of scribbling notes on margins and making little bookmarks from Post-It notes etc., also it's nice to spend some time not staring at a screen. To my surprise it arrived less than 48 hours after ordering - upon closer examination it turned out the book was actually printed by Amazon in Japan, which is kind of nifty.

First impression: it's a very thin volume - 42 actual content pages - so is genuinely a booklet. On the other hand, bearing in mind I've got a bookshelf full of largely unread weighty tomes, less can be more.

The booklet's table of contents, as lifted directly from the publisher's site, is:

Getting a basic export (Simple) Partial database exports (Simple) Restoring a database export (Simple) Obtaining a binary backup (Simple) Stepping int TAR backups (Intermediate) Taking snapshots (Advanced) Synchronizing backup servers (Intermediate) Restoring a binary backup (Simple) Point in time recovery (Intermediate) Warm and hot standby restore (Intermediate) Streaming replication (Advanced)

more...

Posted by Francisco Figueiredo Jr on 2013-05-03 at 02:22:33


Since the beginning, in 2002, Npgsql has been using cvs as its source code management system (SCM). At the time, cvs was being used by a lot of opensource projects, and so it was a natural choice for Npgsql.

A couple of days ago, Npgsql code moved to GitHub. I didn't blog about that in the same day because I wanted to make sure things went well before spreading the word. :) ( You may be asking yourself: But Npgsql code was already at GitHub, wasn't it? Yes, it was. But it was only a mirror of the main cvs repository. And I had to update the code manually every time. Every change had to go first to the main repository cvs and then I would update in GitHub. Obviously the code was often outdated. This is a thing of the past now.)

Git was chosen mainly because there is a lot of documentation about it, it is powerful and because of GitHub. GitHub provides many resources which will help us have a much better environment for our collaborators and users.

For our collaborators there is the idea of cloned repositories and the ability to give us a much better feedback based on pull requests. We will be able to much easily apply changes from our collaborators based on those pull requests. Besides that, our collaborators will be able to use all the power of git to get a better workflow when playing with Npgsql code.

Users will also get a lot of benefits. GitHub provides links to download code from each branch. This will allow users who want to test an unreleased version of Npgsql to try out new features without having to either install git or wait for us to create a beta version.

I'm very confident this change will bring a lot of benefits to Npgsql community. I hope you enjoy this change as much as I do.

I'd like to thank a person who contributed a lot to make this happen. I'm sure that, without his help, this migration would have taken a lot more time: Shay Rojansky. Shay has been helping me since the beginning and gave me all the support I needed to get Npgsql code into GitHub. Thank you very much, Shay!

Also, I'd

[continue reading]

Posted by Dimitri Fontaine in 2ndQuadrant on 2013-05-02 at 09:34:00

In this article, we want to find the town with the greatest number of inhabitants near a given location.

A very localized example

We first need to find and import some data, and I found at the following place a CSV listing of french cities with coordinates and population and some numbers of interest for the exercise here.

To import the data set, we first need a table, then a COPY command:

CREATE TABLE lion1906 (
  insee       text,
  nom         text,
  altitude    integer,
  code_postal text,
  longitude   double precision,
  latitude    double precision,
  pop99       bigint,
  surface     double precision
);

\copy lion1906 from 'villes.csv' with csv header delimiter ';' encoding 'latin1'

With that data in place, we can find the 10 nearest towns of a random choosing of us, let's pick Villeurbanne which is in the region of Lyon.

   select code_postal, nom, pop99
     from lion1906
 order by point(longitude, latitude) <->
          (select point(longitude, latitude)
             from lion1906
            where nom = 'Villeurbanne')
    limit 10;

 code_postal |          nom           | pop99
-------------+------------------------+--------
 69100       | Villeurbanne           | 124215
 69300       | Caluire-et-Cuire       |  41233
 69120       | Vaulx-en-Velin         |  39154
 69580       | Sathonay-Camp          |   4336
 69140       | Rillieux-la-Pape       |  28367
 69000       | Lyon                   | 445452
 69500       | Bron                   |  37369
 69580       | Sathonay-Village       |   1693
 01700       | Neyron                 |   2157
 69660       | Collonges-au-Mont-d'Or |   3420
(10 rows)

We find Lyon in our list in there, and we want the query now to return only that one as it has the greatest number of inhabitants in the list:

with neighbours as (
   select code_postal, nom, pop99
     from lion1906
 order by point(longitude, latitude) <->
          (select point(longitude, latitude)
             from lion1906 where nom = 'Villeurbanne')
    limit 10
)
  select *
    from neighbo

[continue reading]

psql is the native client of PostgreSQL widely used by application developers and database administrators on a daily-basis for common operations when interacting with a PostgreSQL server. With a full set of integrated functionalities, it is among the most popular (if not the number one) client applications in the Postgres community. If you are a [...]
Posted by Peter Eisentraut on 2013-05-02 at 03:13:03

GCC 4.8 was recently released. This is the first GCC release that is written in C++ instead of C. Which got me thinking ...

Would this make sense for PostgreSQL?

I think it's worth a closer look.

Much of GCC's job isn't actually that much different from PostgreSQL. It parses language input, optimizes it, and produces some output. It doesn't have a storage layer, it just produces code that someone else runs. Also note that Clang and LLVM are written in C++. I think it would be fair to say that these folks are pretty well informed about selecting a programming language for their job.

It has become apparent to me that C is approaching a dead end. Microsoft isn't updating their compiler to C99, advising people to move to C++ instead. So as long as PostgreSQL (or any other project, for that matter) wants to support that compiler, they will be stuck on C89 forever. That's a long time. We have been carefully introducing the odd post-C89 feature, guarded by configure checks and #ifdefs, but that will either come to an end, or the range of compilers that actually get the full benefit of the code will become narrower and narrower.

C++ on the other hand is still a vibrant language. New standards come out and get adopted by compiler writers. You know how some people require Java 7 or Python 2.7 or Ruby 1.9 for their code? You wish you could have that sort of problem for your C code! With C++ you reasonably might.

I'm also sensing that at this point there are more C++ programmers than C programmers in the world. So using C++ might help grow the project better. (Under the same theory that supporting Windows natively would attract hordes of Windows programmers to the project, which probably did not happen.)

Moving to C++ wouldn't mean that you'd have to rewrite all your code as classes or that you'd have to enter template hell. You could initially consider a C++ compiler a pickier C compiler, and introduce new language features one by one, as you had done before.

Most things that C++ is picky about are things that a C programme

[continue reading]

Posted by Chris Travers on 2013-05-01 at 14:28:48
Arrays in PostgreSQL are relatively well supported by the indexing engines of this RDBMS.  There are a fair number of options however to be aware of and an understanding of how arrays are indexed is very helpful.  Before going forward, please read my intro on arrays so that you understand the basics.

I am leaving out hash indexes which are not very useful for arrays (or much of anything else).  There are probably cases where hash indexes are useful but if you need them you will know....

BTREE Indexing for Ordinality-centric arrays


The default type of index used by PostgreSQL is a btree index, which provides a search tree using only three operators: <, =, >.  Ordering and ordinality of arrays here is quite important and btree indexes are essentially useless when indexing arrays where ordinality is unimportant.

One should remember that btree indexes are useful primarily in ordering based searches.  The following could use a btree index (and might depending on table statistics):

  • SELECT id FROM mytable WHERE myarray < '{0,2,1,2,3}';
  • SELECT id FROM mytable WHERE myarray = '{0,6,5,4,3}';
In theory a prefix search might be possible if such an operator existed and appropriate work was done on the back-end but this is not currently possible without using functional index.  Again in theory the following might work, but GIN indexes would be better for reasons that will become clear.

  • SELECT id FROM mytable WHERE myarray[1] = '5';

The following cannot work with a btree index under any circumstances:

  • SELECT id FROM mytable WHERE myarray[2] = 5;
 The reason this cannot work is because while myarray[1] would correspond to something we could search the index on, myarray[2] would not be exposed to the tree view of the index until we already know myarray[1].  Thus we cannot traverse a tree on the basis of  array elements beyond the first.

Btree indexes may be useful in some cases with arrays, particularly where ordinality is very important to the search.  However, they are entirely useless where ordinality is unimpor

[continue reading]

PostgreSQL 9.3 will be coming out in beta soon and with that, some who want to experiment with both PostGIS and PostgreSQL 9.3 have asked if they can use PostGIS 2.0. The answer is NO. A lot of major changes happened in PostgreSQL 9.3 that required us to patch up upcoming PostGIS 2.1. These changes were not backported to 2.0 and I personally do not plan to back-port them unless lightning strikes, a big wad of cash falls from the sky, or for some reason we can't make the 2.1 cut before 9.3 comes out. So if you are planning to experiment with PostgreSQL 9.3, PLEASE use PostGIS 2.1 development branch. I will try to make sure we release 2.1 before PostgreSQL 9.3 comes out even if I have to resort to hitting some people over the head with a rubber bat :).

If ever in doubt what versions of PostGIS works with what versions of PostgreSQL /GEOS / GDAL, please refer to the matrix that we try to keep up to date. http://trac.osgeo.org/postgis/wiki/UsersWikiPostgreSQLPostGIS.

Now some people might say "Isn't it cruel not to support PostGIS 2.0 for 9.3", and my answer is "it's crueler to". The reason is simple. We have limited bandwidth for testing permutations of things and the more permutations of things we support, the dirtier our code base becomes making it harder to maintain and also the less time we can devote to properly testing each permutation. I'd rather say we don't support something than to do a half-hearted job of supporting all. On a slightly different, but also pragmatic note, package maintainers (except for windows maintainers :)) generally only carry one version of PostGIS per version of PostgreSQL, and I'd rather users getting from packages see our best foot than a two year old aging foot.

Note: that going from PostGIS 2.0 to 2.1 is a soft upgrade so you can install 2.1 on your existing PostgreSQL 9.2 without dump restore and then you should be able to pg_upgrade over to 9.3 if your database is too big to dump restore.

This is part of a two part series.  The second will discuss indexing arrays which is a complex topic.

PostgreSQL has very good support for arrays of other data types.  One can have nested data types where an attribute of a relation stores an array of tuples, and each of these tuples can hold arrays of other tuples or primitives.

Arrays are extremely powerful in PostgreSQL but they pose a number of significant issues as well.  They can be used in ways which break first normal form but mere use of an array does not necessarily break first normal form's atomicity requirement because arrays differ in significant fundamental ways from relations.  Understanding arrays in PostgreSQL can help avoid costly errors.  Relations and arrays are subject to fundamentally different constraints and therefore are equivalent in only a few very special cases.

Arrays derive their power in PostgreSQL due to the fact that they have strong definitional requirements.

An Array is a Mathematical Matrix


PostgreSQL arrays follow the basic definition of a mathematical matrix.  The array must be rectangular so if one element has an array in it, all other elements must have an array of the same dimensions.   All members of an array must be of the same data types.

The following are thus valid Integer arrays:

  • '{1,2,3,4,5}'
  • '{{1,2},{3,4},{5,6},{7,8},{9,10}}'

The following are not valid integer arrays:

  • '{1,2,3,{4, 5}}'
  • '{{1,2},{3,4},{5}}'
These constraints make arrays particularly useful in PostgreSQL.  One can use text arrays to represent sets of tuples being input into stored procedures for example.  One can also write functions to do matrix math on numerical arrays (we will look at an example below).  If PostgreSQL lacked true arbitrary precision math via the numeric type, we could implement something similar using arrays.  These checks all make arrays safe to work with in ways that most programming languages do not.

Arrays vs Relations vs Tuples (and First Normal Form)


Arrays bear some relationship to relations and tuples, but all

[continue reading]

Last time I wrote about what explain output shows. Now I'd like to talk more about various types of “nodes" / operations that you might see in explain plans. The most basic one is Seq Scan. It looks like this: EXPLAIN analyze SELECT * FROM pg_class; QUERY PLAN --------------------------------------------------------------------------------------------------------- Seq Scan ON pg_class (cost=0.00..10.92 ROWS=292 [...]

This is not the first time I’ve struggled getting SSL certificate validation to work,
so I thought this time I better write down how I did to avoid future time-waste.

For security and convenience reasons, I want to do the signing of client certificates
on a separate dedicated machine, also known as certificate authority (CA).

This allows us to grant new clients access without having to login to the PostgreSQL
server signing certs or modifying pg_hba.conf.

We will create a special database group called sslcertusers.
All users in this group will be able to connect provided they have a
client certificate signed by the CA.

In the example below, replace “trustly” with the name of your company/organisation.

1. CA
The CA should be an offline computer locked in a safe.

1.1. Generate CA private key:

sudo openssl genrsa -des3 -out /etc/ssl/private/trustly-ca.key 2048
sudo chown root:ssl-cert /etc/ssl/private/trustly-ca.key
sudo chmod 640 /etc/ssl/private/trustly-ca.key

1.2. Generate CA public certificate:

sudo openssl req -new -x509 -days 3650 \
-subj '/C=SE/ST=Stockholm/L=Stockholm/O=Trustly/CN=trustly' \
-key /etc/ssl/private/trustly-ca.key \
-out /usr/local/share/ca-certificates/trustly-ca.crt
sudo update-ca-certificates

2. PostgreSQL-server
2.1. Generate PostgreSQL-server private key:

# Remove default snakeoil certs
sudo rm /var/lib/postgresql/9.1/main/server.key
sudo rm /var/lib/postgresql/9.1/main/server.crt
# Enter a passphrase
sudo -u postgres openssl genrsa -des3 -out /var/lib/postgresql/9.1/main/server.key 2048
# Remove the passphrase
sudo -u postgres openssl rsa -in /var/lib/postgresql/9.1/main/server.key -out /var/lib/postgresql/9.1/main/server.key
sudo -u postgres chmod 400 /var/lib/postgresql/9.1/main/server.key

2.2. Request CA to sign PostgreSQL-server key:

sudo -u postgres openssl req -new -nodes -key /var/lib/postgresql/9.1/main/server.key -days 3650 -out /tmp/server.csr -subj '/C=SE/ST=Stockholm/L=Stockholm/O=Trustly/CN=postgres'

2.3. Sign PostgreSQL-server key with CA private key:

sudo openssl req -

[continue reading]

Posted by Shaun M. Thomas on 2013-04-26 at 15:53:04

A little while ago, I wrote to the PostgreSQL general mailing list that I’d been approached by Packt Publishing to contribute a quick manual on doing PostgreSQL backups: Instant PostgreSQL Backup and Restore How-to. They’re the same guys who published Greg Smith’s PostgreSQL 9.0 High Performance book which everyone seems to swear by.

The goal of the backup book was to distill the PostgreSQL documentation, tools, and Wiki down to a collection of short step-by-step guides much like the O’Reilly nutshell series. A lot of the backup recipes we DBAs trade back and forth as a matter of course may not be so obvious, and I threw a couple tricks in for advanced users, to boot.

Well, here’s your chance to spring a free copy! The marketing folks have given the go-ahead to hold a giveaway, and have set aside four copies for lucky winners. A longer description of the How-to is on their site.

All they want to know is what you, as a prospective reader, find most interesting or potentially useful about the book. My comment section will be open until May 8th for replies along these lines. If you are selected, Packt will email you with information on how to get your free digital copy. If your comment catches our attention, you’re one step closer. If you want a print copy, they’re available from Amazon separately.

So remember:

  • Free book
  • What interests you about it?
  • Submit a comment
  • You’re entered

I look forward to forcing Packt to doing some community service by handing out free copies of the book, and you should too. :)

Posted by Josh Berkus in pgExperts on 2013-04-26 at 00:59:24
We're having an Unconference Day at pgCon 2013!  As far as I know, this is a first for the PostgreSQL community, so I've prepared some information on what the unconference is and how it happens. The Unconference Day is taking place on Saturday, May 25, after the main session days at pgCon, so I hope you bought your plane tickets to stay over Saturday! 

For those of you who are unfamiliar, an Unconference is a participant-driven meeting. Typically at an unconference, the agenda is created by the attendees at the beginning of the meeting. Anyone who wants to initiate a discussion on a topic can claim a time and a space. Unconferences typically feature open discussions rather than having a single speaker at the front of the room giving a talk, although any format is permitted.

This means that we want you -- yes, you -- to lead a session at the Unconference Day.  Don't be shy!

pgCon is the primary venue for PostgreSQL contributors around the world to collaborate in person. As pgCon has grown, it has added additional ways to be involved in PostgreSQL development, including the Developer Meeting, the Cluster-Hackers Summit, the PostgresXC Summit, the Schemaverse Tournament, and many other satellite events.

This Unconference Day will permit us additional collaboration time, in order to work on many things which didn't make it into the formal pgCon program, including:
  • open or round-table discussions among community teams, such as the Web Team, Advocacy Team, Buildfarm hosts, and others.
  • new or emerging topics which came up after the pgCon Call for Papers was closed.
  • spillover topics from things which came up during pgCon
  • development planning and coordination for contributors who were not invited to the Developer Meeting
  • active working or hacking sessions
Most of the Unconference Day will be planned between 10am and 11am on the day of.  However, there are a couple things you could do right now to help get ready for it:
  1. If you have an idea for a session at the Unconference Day, add it to the wiki page.
  2. If you are

[continue reading]

Posted by gabrielle roth on 2013-04-26 at 00:47:06

When: 7-9pm Thu May 16, 2013
Where: Iovation
Who: Group
What: Lightning Talks

Recent presentations have sparked several conversations of the “no sh–, there I was” variety, so for May we’ll be doing Lightning Talks. Bring your stories of wizardry, abject failure, and “Hail Mary” passes. We do request that your topic have something to do with Postgres, but that’s the only requirement – you don’t even need to have slides if you don’t want to. If you’ve never given a talk before, a lightning talk is a great low-risk way to try it out!

Our meeting will be held at Iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!

Building security will close access to the floor at 7:30.

After-meeting beer location: Huber’s. See you there!


Around the time that Xzilla wrote about the book, Packt contacted me and asked for a review. Since I generally don't really read technical books, I declined the offer, but Sandy from Packt was very persistent, and asked if I could inform about book giveaway contest. The book is definitely PostgreSQL related, and Xzilla suggested [...]
Posted by Josh Berkus in pgExperts on 2013-04-25 at 00:39:38
So a boolean mode() was very simple to construct, partly because booleans have only three potential values (TRUE, FALSE, NULL).  However, in many cases we want mode() for, say, integers.  Now, there are several constructions of mode() depending on what statistic you really want, but here I'm concerned with the simplest one: Most Common Value (MCV).

One way to get this kind of mode is to do a windowing function, but as mentioned this works poorly with other aggregates in the same result set.  So let's take the same custom aggregate approach and see if we can do better.

Now, for types other than boolean, mode() is a "two-pass aggregate".  That means it's impossible to calculate in one pass; you need two passes, one to sort the set, and one to count the items and pick the MCV.   Since we know we'll need two passes, we'll construct our aggregate to assume that it's receiving sorted data going in, and make sure it gets sorted data when we use it.  Given 9.2's new ORDER BY clause for aggregates, that's easy to ensure.

For a state type, we'll need a 4-part register.  This register will include:

  1. the last seen value
  2. the count of the last seen value
  3. the most common value so far
  4. the count of the most common value so far
Again, we'll use an array rather than a composite type so that we don't have to create a type for the function.  Then we can construct a pure-SQL state function:

create or replace function sorted_mode_state(
    tally bigint[], nextval bigint
)
returns bigint[]
language sql
immutable
as $f$

-- have we seen this value before?
SELECT CASE WHEN nextval = tally[1] THEN

    -- if weve seen it before, does it out-count the prior MCV?
    CASE WHEN tally[2] = tally[4] THEN

        -- if so, swap the MCV
        ARRAY[ nextval, tally[2] + 1, nextval, tally[2] + 1 

    ELSE
        -- if not, keep counting
        ARRAY[ nextval, tally[2] + 1, tally[3], tally[4] ]
    END

-- is the register uninitialized?  then take the first value
WHEN tally[1] IS NULL THEN
    ARRAY [ nextval, 1, nextval, 1 ]

-- skip nulls
W

[continue reading]

Among one of the many new features implemented in 9.3, pg_dump now offers the possibility to perform parallel dumps with pg_dump. This feature has been introduced by the commit below. commit 9e257a181cc1dc5e19eb5d770ce09cc98f470f5f Author: Andrew Dunstan Date: Sun Mar 24 11:27:20 2013 -0400   Add parallel pg_dump option.   New infrastructure is added which creates a [...]
Entity-attribute-value (EAV) modelling is typically seen as an antipattern and for good reasons.  Misapplied it leads to inflexible queries, poor performance and a number of other bad things.  Nonetheless there are classes of problems where EAV is the right solution for the problem at hand.  Two specific areas of LedgerSMB will demonstrate this, namely account/dropdown management and the menu schema.  These actually represent different classes of problems but they have a number of things in common.  The goal of this post is to help readers better understand where EAV modelling can be of great help.

In general, the areas of LedgerSMB which use EAV modelling are stable, rarely changing data with relatively simple to non-existent validation routines (the menu is expected to be externally validated).   In some cases the basic expectations of EAV are modified slightly.  In both these cases (which represent relatively different classes of problems) we have been very happy with EAV and while our approach will no doubt be refined in the future, it is unlikely that these portions of the software will ever move from EAV.

Note the LedgerSMB development team does not advocate starting with EAV for most problems.  Usually where EAV is appropriate it will naturally emerge in the schema design process.

What is EAV?


EAV is a way to model relatively free-form data in such a way as to dispense with the need for extremely complex, long tables for specifying sparse data.  A typical EAV approach has three basic tables representing entity data, attributes, and value data.

To make EAV work successfully, as much data as possible should be put in the entity table or parent/child tables.  It's only where this approach breaks down that EAV is a good idea.  The examples below will flesh this out more.  However  a basic typical design might look like:

CREATE TABLE entity (
    id serial not null unique,
    label text primary key,
    ....
);

CREATE TABLE attribute (
   id serial not null unique,
   att_name text not null,
   att_datat

[continue reading]

Posted by Joel Jacobson on 2013-04-24 at 13:41:32

Two people might be simultaneously modifying the same database function. To protect against this scenario, I created a little tool to stage the source code of all functions in pg_proc and the definition of all views. You then deploy your changes, and afterwards you call fdiff() to show a diff of the changes made, by comparing the new content of pg_proc and the definitions of all views. If the changes are what you expected, you go ahead and commit.

We’ve been using this methodology every day in both my current company and my previous one, and it has prevented us from doing mistakes.

This is how to use it:

  1. Start a new transaction

    BEGIN;

  2. Temp store state of all views/functions

    SELECT fstage();

  3. Deploy changes to schema

    CREATE OR REPLACE FUNCTION ...
    CREATE OR REPLACE VIEW ...

  4. Show a diff of changes made

    SELECT fdiff();

  5. If the changes are expected, go ahead and commit

    COMMIT;

Example on the output from fdiff():


+-------------------------------+
| Updated or replaced functions |
+-------------------------------+

Schema................: public
Name..................: process_order
Argument data types...: _orderid bigint
Result data type......: boolean
Language..............: plpgsql
Type..................: normal
Volatility............: STABLE
Owner.................: amazon
20 c OR Orders.ShippingDate IS NOT NULL
20 c OR Orders.ShippingDate > now() - interval '2 month'


Backing up every database on every version and each running cluster on a host in a single script. And also encrypting the backupfiles on-the-fly and generating a script for retrieving the encrypted backupfiles from the backup-host... :)

In my previous blogentry I didn't explain any of the prerequisites for using the script. So I'll try to cover these below.

I'm having a postgresql database running in a remote location (backend for some webapps/websites) - and I wanted to expand my backup-script, so that I could download my (encrypted) backupfiles from this location over plain http.

So I have a linuxbox (httpserver) that I use as my backup-host. On this box I create a backupuser:

root@httpserver:~#useradd -m <backupusername>
So now I have a user that is locked (no password) with a home in /home/<backupusername>. I want to only be able to logon with a key, so this is ok. But it's probably easier to give the backupuser a temporary password, so that i can use ssh-copy-id to transfer the key. So lets do that:
root@httpserver:~#passwd <backupusername>

Then on my postgresql linuxbox (logged in as postgres), I generate a ssh-key pair (and do this with no password!):

postgres@postgresqlserver:~$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/var/lib/postgresql/.ssh/id_rsa):
Created directory '/var/lib/postgresql/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /var/lib/postgresql/.ssh/id_rsa.
Your public key has been saved in /var/lib/postgresql/.ssh/id_rsa.pub.
The key fingerprint is:.....
And then transfer this key to the backuphost:
postgres@postgresqlserver:~$ ssh-copy-id <backupusername>@httpserver
<backupusername>@httpserver's password:
Now try logging into the machine, with "ssh '<backupusername>@httpserver'", and check in:

.ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.
And if I try to log on:
postgres@postgresqlserver:~$ ssh <backupusername>@httpserver
I should

[continue reading]

PostgreSQL has a pretty decent user base in India. The Indian government, a host of companies, and freelancers employ it as a database backend of choice. Pune city in India has a huge concentration of PostgreSQL developers/users with many PostgreSQL affiliated companies having their development centers setup here.

read more

Posted by Robert Treat in OmniTI on 2013-04-20 at 20:32:00
A couple of months ago the folks a PACKT had asked me if I could tech review one of their new books; PostgreSQL Backup and Restore How-to. What caught my eye about this was the idea behind the book; pick a single topic that is important to people using the software, and then cover the topic quickly and efficiently. Postgres is a really large piece of software, with a heck of a lot of moving parts, so it's difficult to cover the entire thing in one book. This approach is one that I have been suggesting to publishers for awhile, so I was happy to help PACKT with their attempt.

The book itself covers a number of different options when it comes to Postgres backups; from pg_dump to how to make filesystem backups using PITR and the WAL system. If you're working with Postgres and you have questions about the different options available for doing backups and/or restores, I encourage you to check it out.

Cover-of-Packt-PostgreSQL-Backup-and-Restore-Howto-book
Posted by Leo Hsu and Regina Obe on 2013-04-20 at 17:25:00

In Happy Valentine PostGIS we demonstrated how to use PostGIS raster to decipher letters from a raster, vectorize them and then reuse this vectorized letters to form new words. Admittedly the letters were a bit grainy since they were vectorizations of low res rasters and I didn't bother smoothing them. Bruce Rindahl offered a script to SVG to PostGIS geometry and using Batik to convert a font file to SVG format and gave me a hi-res converted kankin fontset. I still haven't figured out how his script works.

Bborie Park thought that was all too complicated and thought (as I have always) that we need an ST_GeomFromSVG function for PostGIS of which he is on a mission to create when he's less busy. He also suggested I wrap my letter writer function as an extension. Taking all these ideas, I formulated an extension you install with

CREATE EXTENSION postgis_letters;

postgis_letters (http://www.bostongis.com/postgisstuff/postgis_letters_extension.zip) is an sql / data extension containing mostly data, but as the name suggests relying on PostGIS. The data are geometry vectors of the kankin font. I plan to add in more free fonts later once I figure out how to use Bruce's script or Bborie comes up with a better way and also more positioning logic and handling of spaces. So its a little rough at the moment. The purpose of the extension is so I can write words on my images in reports e.g. state names or overlay labels on geometry features like roads and land. Using the power of both geometry/raster you can have a fully functioning report image writer that would return a fully formed image for use in OpenOffice (or for my ASP.NET web apps Active Reports.NET). This wouldn't rely on any mapping server to draw images (just pure PostGIS/PostgreSQL). Hopefully with new and improved binary features coming in PSQL for (looks like 9.4), outputting these raster images from psql will also be trivial. While on my mission to do something useful, I got distracted by something more entertaining: describing spatial processes with words. He

[continue reading]

Posted by Joel Jacobson on 2013-04-20 at 15:48:31

I just watched a presentation from Mar 11, 2012, where Paul Graham / YCombinator is giving away seven billion dollar startup ideas:

  1. A New Search Engine
  2. Replace Email
  3. Replace Universities
  4. Internet Drama
  5. The Next Steve Jobs
  6. Bring Back Moore’s Law
  7. Ongoing Diagnosis

Idea 6 on the list, is basically about the need to invent the sufficiently smart compiler, capable of understanding how to automatically break down a problem (expressed in code) into pieces which can be executed in parallell, in order to utilize GPUs and/or a lot of CPU cores.

Building such a compiler is probably extremely difficult, if at all possible, perhaps mainly due to the complexity possible to express in C, C++, Python, Java or any normal imperative language.

Imagine instead the same problem but for a declarative language like SQL. Would it be equally difficult? Core member Bruce Momjian of the PostgreSQL team have already expressed his ideas on how to tackle the problem.

When thinking about it, the query planner is a bit like a compiler, kind of. It takes the SQL and figures out a smart way to execute it, breaking it down into a lot of sub-tasks, of which some could with benefit be executed in parallel.

So let’s hope the great pg-hackers will pull this off, so we don’t have to pay some future start-up compiler-billionaire license fees to get automatic parallelism.


Continuous Integration (CI)  using automated open source tools such as Jenkins and Hudson  is getting adoption rapidly. These tools help developers to gain confidence for creating more robust code rapidly by improving testing and QA process. The flexibility of these softwares add other challenges for the DBAs!

One of our client came across challenge to cleanup databases after X number of days from the Jenkins CI database because each run create seperate database and database names are not standard because they are provided by users. If they don’t cleanup old database, the cluster will have hundreds of databases at the end of the week. We tried to standardize database names but you can’t control users to make mistakes or input db names :-) On the other hand, Postgres’s system catalog view doesn’t provide database creation date.  How can I find out databases older than X days and drop them?

I came across this blog entry that answers my question but I was looking for easier way! I found easier way to get the database creation time with single query! Yay  :-) Following is the query that can be used to find the database creation time. The query should return correct created_date as long as you haven’t run pg_upgrade on the data directory.  I thought to share here so it will be useful for others!

SELECT datname, (pg_stat_file(‘base/’||oid||’/PG_VERSION’)).modification AS datcreated
FROM pg_database;

 

postgres=# SELECT datname, (pg_stat_file(‘base/’||oid||’/PG_VERSION’)).modification AS datcreated
postgres-# FROM pg_database;
datname | datcreated
————+————————
template1 | 2013-03-28 16:04:13-04
template0 | 2013-03-28 16:04:14-04
postgres | 2013-03-28 16:04:14-04
rangetypes | 2013-03-28 16:14:42-04
puppet | 2013-03-28 16:23:13-04
omniti | 2013-04-20 10:02:22-04
(6 rows)

 

Ideally,  pg_database system catalog view should include database_created timestamp ! Hopefully, that day will come sooner than later :-)

Feel free to comment, if you have any other ideas for getting this details or you see any corner cases with above

[continue reading]

Preface

There are a lot of tools for PostgreSQL development. I want to propose you one more. Why Lazarus:

  • Lazarus is shipped with open-source Pascal compiler;
  • it has a high degree of Delphi compatibility;
  • availability on a variety of platforms, including Windows, Mac OS X, and Linux;
  • complete source code is available;

Installing Lazarus

Go to Lazarus at SourceForge.

Then at the Downloads Section get proper installer. I used lazarus-1.0.8-fpc-2.6.2-win32.exe and lazarus-1.0.8-fpc-2.6.2-win64.exe in my tests. These are the latest releases at the moment.

Here are the screen shots of the installation process where you can see what I had chosen:
Lazarus Installation

Path to install

Full installation

I have Lazarus installed into C:\lazarus\ and hereafter I will use this path.

Getting PostgresDAC

Now it’s time for PostgresDAC. At the moment many Laarus IDE functions require the source and warns if the source of a unit is missing. Thus we need PostgresDAC with sources version. Sources are available for licensed users in their profile.

MicroOLAP Profile

I have extracted PostgresDAC sources to C:\PostgresDAC\.

Install PostgresDAC into the IDE

In the Lazarus IDE choose open package and select dclPostgresDACL.lpk. This is the design time package. Then in the Package window choose Use → Install.
Install package

Confirmation about rebuild will appear.

Rebuild?

Choose “Yes”. Wait for some time. After successful rebuild Lazarus will reopen itself. Tada! We got it!

PostgresDAC in the Lazarus

Now we may open some demo projects and play some time.
GridDemo for PostgresDAC in the Lazarus


Filed under: PostgresDAC Tagged: development, FreePascal, lazarus, PostgresDAC, PostgreSQL
Posted by Andrew Dunstan in pgExperts on 2013-04-18 at 15:02:23
The other day Josh Berkus was working on one of our clients and needed a fast way to unnest some arrays with ordinality, that is to have a row number attached to each element. The usual way that is recommended for doing this is to use subscripting with generate_series(), or calling unnest() in a plpgsql function and then returning the elements one by one. Both of these turned out to be not fast enough, and he was resorting to some fairly ugly code to get around the speed issue. I offered him a simple C function that would do this very quickly, and we're releasing it publicly today. In use on our client's site it has been extremely fast and simple to use.

example:
andrew# create extension unnest_ordinality;

andrew=# select * from unnest_ordinality('{a,b,c,d,e,f,g}'::text[]);
element_number | element
----------------+---------
1 | a
2 | b
3 | c
4 | d
5 | e
6 | f
7 | g
The package is available on The PostgreSQL Experts Inc Github Repository and also on PGXN
Posted by Josh Berkus in pgExperts on 2013-04-17 at 20:59:18
Custom aggregates are one of those features which are unique to Postgres, and seem hopelessly obscure.  Once you actually create one in earnest though, you'll wonder how you ever lived without one.  To get you started, I'm going to walk you through creating a very simple custom aggregate: one which gives you the mode (most frequent value) for a boolean column.

Why would you want such a thing?  Well, imagine you're monitoring your webservers, and you want to present 1-hour summaries of whether they are up or down.  However, you have data for each 30 seconds.  For a webserver which is up most of the time you want to return TRUE; for one which was down for most of the hour you want to return FALSE.  If the monitoring system was down (and thus there's no data), you want to return NULL.

You could do this using windowing queries.  However, that doesn't work well with other cumulative statistics, such as the number of minutes up.  You want something you can display side-by-side with other aggregate stats.  Well, with PostgreSQL, it's surprisingly easy!

First, we need a "state function" which accumulates data about the boolean.  This state function generally has two parameters, a data type which accumulates value, and the data type of the column you're aggregating.  In our case, we want to accumulate two counters: a count of falses and a count of trues, which we do using an array of INT.  This can be done with a pure-SQL function:

CREATE OR REPLACE function mode_bool_state(int[], boolean)
returns int[]
language sql
as $f$
SELECT CASE $2
WHEN TRUE THEN
    array[ $1[1] + 1, $1[2] ]
WHEN FALSE THEN
    array[ $1[1], $1[2] + 1 ]
ELSE
    $1
END;
$f$;


Once both registers have been accumulated, we need to use a "final" function to compare them and decide which is the mode, which will accept the accumulation type (INT[]) and return boolean:

CREATE OR REPLACE FUNCTION mode_bool_final(INT[])
returns boolean
language sql
as $f$
SELECT CASE WHEN ( $1[1] = 0 AND $1[2] = 0 )
THEN NULL
ELSE $1[1] >= $1[2]
END;
$f$;


Then we can

[continue reading]

Continuing on the coverage of new JSON features added in Postgres 9.3, and after writing about JSON data generation and JSON operators, let’s now focus on some new functions that can be used for the parsing of JSON data. The are many new functions introduced: json_each, json_each_text json_extract_path, json_extract_path_text json_object_keys json_populate_record, json_populate_recordset json_array_length json_array_elements The [...]
Posted by Robert Treat in OmniTI on 2013-04-17 at 11:45:03
The phpPgAdmin Team is proud to announce the new major release of
phpPgAdmin. Version 5.1 add many new features, bug fixes and updated
translations over the previous version.

The version has been long overdue, and brings with it stable support
for all current versions of PostgreSQL, including 9.1 and 9.2. In
addition, there are also a fair number of bugs that have been fixed,
including a bug that could lead to corrupted bytea data, so all users
are strongly encouraged to upgrade.

We appreciate the large number of people that use phpPgAdmin on a
regular basis, and hope this new version will help make things even
better!

Download


To download phpPgAdmin 5.1 right now, visit:
http://phppgadmin.sourceforge.net/doku.php?id=download

Features


  • Full support for PostgreSQL 9.1 and 9.2

  • New plugin architecture, including addition of several new hooks (asleonardo, ioguix)

  • Support nested groups of servers (Julien Rouhaud & ioguix)

  • Expanded test coverage in Selenium test suite

  • Highlight referencing fields on hovering Foriegn Key values when browsing tables (asleonardo)

  • Simplified translation system implementation (ioguix)

  • Don't show cancel/kill options in process page to non-superusers

  • Add download ability from the History window (ioguix)

  • User queries now paginate by default


Translations


  • Lithuanian (artvras)


Bug Fixes


  • Fix several bugs with bytea support, including possible data corruption bugs when updating rows that have bytea fields

  • Numerous fixes for running under PHP Strict Standards

  • Fix an issue with autocompletion of text based Foreign Keys

  • Fix a bug when browsing tables with no unique key


Incompatibilities


  • phpPgAdmin core is now UTF-8 only

  • We have stopped testing against Postgres versions < 8.4, which are EOL



Regards,
The phpPgAdmin Team

In almost any application it's common to want to aggregate some set of values together, commonly in a comma separated form. Most developers do this by running a query to get much of the raw data, looping over the data and pushing it into a set, appending each new value to the appropriate key. Hopefully, it's not a surprise that there's a much better way to do this with PostgreSQL.

Postgres has a flexible and robust array datatype that comes with a variety of functions. Even without taking advantage of the array datatype in your application, you can still take advantage of some of the functions to get the functionality you need. Lets take a look at an example schema and use case.

An example

Given a project management application, you may have users who have projects that have tasks. An example piece of functionality might be to send an email with a list of all projects that have tasks that are past their due dates of completion. Your schema might look something like this:

 # \d users
             Table "public.users"
    Column   |            Type             | Modifiers
 ------------+-----------------------------+-----------
  id         | integer                     | not null
  email      | character varying(255)      |
  ...

# \d projects
             Table "public.projects"
    Column   |            Type             | Modifiers
 ------------+-----------------------------+-----------
  id         | integer                     | not null
  user_id    | integer                     | not null
  name       | character varying(255)      | not null
  ...

# \d tasks
             Table "public.tasks"
    Column     |            Type             | Modifiers
 --------------+-----------------------------+-----------
  id           | integer                     | not null
  project_id   | integer                     | not null
  completed_at | timestamp without time zone | 
  due_at       | timestamp without time zone | 
  ...

To get a list of all projects that have tasks that haven't been completed, you would start

[continue reading]