Planet PostgreSQL

The world's most advanced open source database

Cube extension improvement (Stas Kelvich)

Indexes on the cube data type (in the "cube" extension) tend to be on the large size, meaning maintenance of and accessing these indexes is expensive. The improvements this project aims to implement are in reducing the cost of indexes on cube data by using r-tree structures. In addition to this, PostgreSQL's relatively new K-Nearest Neighbour framework would serve to allow the creation of ordering operators for retrieving sorted data directly from the index, and ordering operators for kNN with different spatial norms.

UPDATE ... RETURNING OLD (Karol Trzcionka)

PostgreSQL can perform UPDATE statements and return the new row by using the RETURNING clause and referencing the columns you want. This project would introduce NEW and OLD aliases to provide the ability to reference not just the new row but also the old. This would allow for a before/after comparison of data.

Efficient KNN search through high-dimensional indexing with iDistance (Mike Schuh)

This will introduce a new indexing algorithm that utilises a high-dimensional space leading to more efficient K-nearest neighbour searches. Such indexes are an advantage over b-tree and r-tree which degrade with a modest increase in dimensions, whereas the iDistance algorithm has been demonstrated to remain well-performing and efficient.

Of course our students won't be left to work in isolation; they will also receive guidance from established community members specifically assigned to mentor them. We welcome Stas, Karol and Mike to the community, and hope not only that they are successful in their projects, but that they continue to contribute beyond this year's Google Summer of Code. Also thanks to

[continue reading]

Fun with pg_catalog.pg_depend

Posted by Joel Jacobson on 2013-05-27 at 22:31:45

Learning PostgreSQL and SQL in general probably begins with the concept of TABLES. Then probably VIEWS, INDEXES and maybe TRIGGERS.
Some users might not ever go any further, which is sad, because there is so much more to explore!

I thought it would be cool to automatically generate a graph showing the dependencies between objects and their types.
This shows the order in which the different types of objects can be created,
perhaps mentally useful to think in terms of “after we have created a TABLE we can create an INDEX”.

Something like this would be nice to include in the PostgreSQL documentation online.
I think it would be helpful when learning about PostgreSQL different object types.

The graph below was produced using GraphViz dot command and live-data from pg_catalog.pg_depend:

As we can see, before anything else, we need a SCHEMA, which is the root node.
Once we have a SCHEMA, we can create TABLES, TYPES, VIEWS, SEQUENCES and FUNCTIONS.
Some users might not even know about SCHEMAs, as the schema “public” is pre-installed.
To create an INDEX, we first need a TABLE.
Etc, etc, etc…

You might be surprised FUNCTION and LANGUAGE have arrows pointing in both directions.
Turns out you need some functions before you can create a language like plperl, such as plperl_call_handler.
The self-referencing arrow from/to FUNCTION is less surprising as some functions can of course call other functions.

(Not all object types are included in this graph as I’m not using them all in my system.)

Insights from the PgCon 2013

Posted by Emanuel Calvo on 2013-05-27 at 21:05:20

PgCon 2013 was been attended by 256 people across the globe. Attendees had the opportunity to enjoy tutorials, talks and an excellent unconference (this last deserves a special mention).

I lectured a talk related with Full text search using Sphinx and Postgres (you can find the slides at http://t.co/lgFoLq37EC, and all of the talks have been recorded). The quality of the talks in general was quite good, but I don't want to repeat what you will find in other posts.

The unconference was attended quite late into the evening. You can find a schedule of it, as well as the minutes of some of the talks that happened (and others that didn't also) here.

There was an special emphasis on the pluggable storage feature, albeit most agree that it will be a very difficult feature to implement in the near versions. A topic related to this, was the Foreign Data Wrapper enhancements.

Pluggable Storage engine was extended after. The main reason of why everybody agrees with this feature, is because an API for the storage will allow companies to collaborate with code and avoid forks to other projects.

There was a long discussion also about migrations on the hall, using pg_upgrade.

The features about replication were bi-directional and logical replication.

Full text search unconference discussion was pretty interesting. Oleg Bartunov and Alexander showed a really interesting work coming up for optimizing GIN indexes. According to their benchmarks, Postgres could improve the performance significantly.

There were a lot of discussion I missed, due the wide number of tracks and "hall spots". But th emajority of attendees I heard agreed that the unconference was quite exciting and granted the possibility to bring many new ideas.

New committers to PostgreSQL

Posted by Magnus Hagander in Redpill Linpro on 2013-05-26 at 16:13:14

During the closing session of PGCon this year, the core team announced the addition of four new committers to PostgreSQL:

Jeff Davis
Stephen Frost
Fujii Masao
Noah Misch

These have all been involved in both writing new code for PostgreSQL and reviewing other peoples patches during the latest couple of development cycles. With this addition, we will increase the capacity to handle the rising number of contributions we get, and get even more features into the upcoming versions of PostgreSQL.

Welcome to the team!

PGCon Conference Report

Posted by Bruce Momjian in EnterpriseDB on 2013-05-26 at 03:45:01

PGCon certainly had some energizing talks and meetings this week. First, Jonathan Katz gave a tutorial about Postgres data types. Though I missed his talk, I just reviewed his slides and it is certainly a great tour of our powerful type system.

Second, Oleg Bartunov and Teodor Sigaev gave a great presentation about merging the very popular hstore and JSON date types into a new hierarchical hash data type, humorously code-named 'jstore'. (Their work is funded by Engine Yard.) This generated a huge amount of discussion, which has continued to today's unconference.

Third, Alvaro Hernandez Tortosa's talk The Billion Tables Project (slides) really stretched Postgres to a point where its high-table-count resilience and limitations became more visible. A significant amount of research was required to complete this project.

Redis FDW Singleton Key tables

Posted by Andrew Dunstan in pgExperts on 2013-05-25 at 12:26:27

I recently mentioned the possibility of tables being drawn from a single Redis object, so you would get one row per key/value pair in the named hash, or one row per element in the named scalar, set, list or zset. This has now been committed, for use with Release 9.2 and later. There are examples in the regression test files. This is going to be particularly important when we get to writable tables, which is the next order of business.

For those of you who were in my talk yesterday, the breakage I mentioned has now been fixed.

CREATE TABLE – the fancy way

Posted by Hans-Juergen Schoenig in Cybertec on 2013-05-25 at 09:54:58

One tiny little feature many users of PostgreSQL have often forgotten is the ability to create similar tables. It happens quite frequently that you want to create a table, which is just like some other one. To achieve that most people will do … CREATE TABLE x AS SELECT … LIMIT 0; This works nicely, [...]

Custom Background Worker: a practical example

Posted by Ian Barwick on 2013-05-24 at 21:36:00

A while back I posted some SQL which helps track of changes to the PostgreSQL settings file. I've found it useful when benchmarking tests with different settings, but unfortunately the pg_settings_log() function needs to be run manually after each setting change. However that sounds like something which a custom background worker (new in 9.3) could handle - basically all the putative background worker would need to do is execute the pg_settings_log() function whenever the server starts (or restarts) or receives SIGHUP .

This turned out to be surprisingly easy to implement. Based off the example contrib module and Michael Paquier's excellent posts , this is the code . Basically all it does is check for the presence of the required database objects (a function and a table) on startup, executes pg_settings_log() on startup, and adds a signal handler for SIGHUP which also calls pg_settings_log() .

more...

Redis talk slides

Posted by Andrew Dunstan in pgExperts on 2013-05-24 at 15:34:54

Here are the slides from my talk on PostgreSQL and Redis.

The People of Postgres: Tom Lane

Posted by Selena Deckelmann on 2013-05-24 at 15:30:50

This post was originally posted on Medium, a new blogging platform made up mostly of people who aren’t necessarily subscribed to Planet. So, please forgive the obvious statements, as the target audience are people who don’t know very much about Postgres. Tom Lane, taken by Oleg Bartunov

Wednesday May 23, with no fanfare, Tom Lane’s move to Salesforce.com was made public on the Postgres developer wiki.

For 15 years, Tom has contributed code to Postgres, an advanced open source relational database that started development around the same time as MySQL but has lagged behind it in adoption amongst web developers. Tom’s move is part of a significant pattern of investment by large corporations in the future of Postgres.

For the past few years, Postgres development has accelerated. Built with developer addons in mind, things like PLV8 and an extensible replication system have held the interest of companies like NTT and captured the imagination of Heroku.

Tom has acted as a tireless sentry for this community. His role for many years, in addition to hacking on the most important core bits, was to defend quality and a “policy of least surprise” when implementing new features.

Development for this community is done primarily on a mailing list. Tom responds to so many contributor discussions that he’s been the top overall poster on those mailing lists since 2000, with over 85k messages.

Really, he’s a cultural touchstone for a community of developers that loves beautiful, correct code.

Someone asked: “What does [Tom’s move] mean for Postgres?”

You probably don’t remember this:

Salesforce.com bases its entire cloud on Oracle database,” Ellison said, “but its database platform offering is PostgreSQL. I find that interesting.

When I read that last October, I was filled with glee, quickly followed by terror. I love my small database community, my friends and my job. What if Oracle shifted it’s attention to our community and attacked it, directly? So far, that hasn’t happened.

Instead, Salesforce advertised they were hiring “5 new engineers…and 40 to 50 mo

[continue reading]

Query Planning Gone Wrong

Posted by Robert Haas in EnterpriseDB on 2013-05-24 at 03:57:56

Over the past few years, I've been making notes on pgsql-performance postings, specifically those postings which relate to query performance issues. Today, I gave a talk at PGCon on the data I've been able to gather.

If you attended the talk, please leave feedback through the PGCon web site or feel free to leave a comment below with your thoughts. If not, you can find the slides on my Presentations web page. A few people asked me to post the raw data on which the talk was based, including links to the original threads. I have created a Query Performance section on my Google Site and posted the information there.

The version posted on the web site incorporates a few minor corrections as compared to what I presented in the talk; and I have left out (for the sake of politeness) the cases I attributed to user error. There were actually only 2 such cases, not 3 as I said in the talk, but either way it seems more polite not to post specific links. Please contact me if you find other mistakes in what I have posted and I will correct them.

Many thanks to all those who said nice things about my talk!

New Presentation Online

Posted by Bruce Momjian in EnterpriseDB on 2013-05-24 at 03:30:01

I delivered my presentation "Nulls Make Things Easier?" today at PGCon, so I have placed my slides online. The presentation is based on a series of eleven blog posts about NULLs I did a few months ago.

Seeking Revisited: Intel 320 Series and NCQ

Posted by Greg Smith in 2ndQuadrant on 2013-05-23 at 22:02:47

Running accurate database benchmark tests is hard. I’ve managed to publish a good number of them without being embarrassed by errors in procedure or results, but today I have a retraction to make. Last year I did a conference talk called “Seeking PostgreSQL” that focused on worst case situations for storage. And that, it turns out, had a giant error. The results for the Intel 320 Series SSD were much lower in some cases than they should have been, because the drive’s NCQ feature wasn’t working properly. When presenting this talk I had a few people push back that the results looked weird, and I was suspicious too. I have a correction to publish now, and I think the way this slipped by me is itself interesting. The full updated SeekingPostgres talk is also available, with all of the original graphs followed by an “Oops!” section showing the next data.

Native Command Queueing is an important optimization for seek heavy workloads. When trying to optimize work for a mechanical disk drive, it’s very important to know where the drive is currently at when deciding where to go next. If you have a read for that same area of the drive in the queue, you want to read that one now, get the I/O out of the way while you’re nearby, and then move to another physical area of the disk.

However, on a SSD, you might think that re-ordering commands isn’t that important. If reads are always inexpensive, taking a constant and small period of time on a flash device, their order doesn’t matter, right? Well, that’s wrong on a few counts. The idea that reads always take the same amount of time on SSD is a popular misconception. There’s a bit of uncertainty around what else is happening in the drive. Flash cells are made of blocks larger than a single database read. What happens if you are reading 8K of a cell that is being rewritten right now, because someone is updating another 8K section? Coordinating that is likely to pause your read for a moment. It doesn’t take much lag at SSD speeds to result in a noticable slowdown.

[continue reading]

Blackhole FDW

Posted by Andrew Dunstan in pgExperts on 2013-05-23 at 21:49:26

My Blackhole FDW talk seemed to go well. The line about no unintentional data loss got the best laugh. Here are the slides.

Besides being a bit of fun, this did have a serious purpose - creating a skeleton for building an FDW, including the writable API. The code has the contents of the docs on writing an FDW as comments in the appropriate functions, to help a new FDW writer.

The code is on bitbucket.

Review – “Instant PostgreSQL Starter”

Posted by Pierre Ducroquet on 2013-05-23 at 21:02:53

Thanks to Shaun M. Thomas, I have been offered a numeric copy of the “Instant PostgreSQL Backup” book from Packt publishing, and was provided with the “Instant PostgreSQL Starter” book to review. Considering my current work-situation, doing a lot of PostgreSQL advertising and basic teaching, I was interested in reviewing this one…

Like the Instant collection ditto says, it’s short and fast. I kind of disagree with the “focused” for this one, but it’s perfectly fine considering the aim of that book.

Years ago, when I was a kid, I discovered databases with a tiny MySQL-oriented book. It teaches you the basis : how to install, basic SQL queries, some rudimentary PHP integration. This book looks a bit like its PostgreSQL-based counterpart. It’s a quick travel through installation, basic manipulation, and the (controversy) “Top 9 features you need to know about”. And that’s exactly the kind of book we need.

So, what’s inside ? I’d say what you need to kick-start with PostgreSQL.

The installation part is straight forward : download, click, done. Now you can launch pgadmin, create an user, a database, and you’re done. Next time someone tells you PostgreSQL ain’t easy to install, show him that book.

The second part is a fast SQL discovery, covering a few PostgreSQL niceties. It’s damn simple : Create, Read, Update, Delete. You won’t learn about indexes, functions, advanced queries here. For someone discovering SQL, it’s what needs to be known to just start…

The last part, “Top 9 features you need to know about”, is a bit more hard to describe. PostgreSQL is a RDBMS with included batteries, choosing 9 features must have been a really hard time for the author, and I think nobody can be blamed for not choosing that or that feature you like : too much choice… The author spends some time on pg_crypto, the RETURNING clause with serial, hstore, XML, even recursive queries… This is, from my point of view, the troublesome part of the book : mentioning all these features means introducing complicated SQL queries. I would never te

[continue reading]

Buildfarm download location

Posted by Andrew Dunstan in pgExperts on 2013-05-23 at 14:09:58

It was just pointed out to me that the download link on the buidfarm server front page wasn't updated when I fixed the other links after switching to publishing them on the buildfarm server itself. That's been fixed now. The only valid link for downloading the client is http://www.pgbuildfarm.org/downloads/. Sorry for any confusion.

Developer meeting went well

Posted by Andrew Dunstan in pgExperts on 2013-05-23 at 11:45:43

There seems to be a consensus, which I share, that the annual PostgreSQL Developers Meeting went much better this year that in the previous couple of years.

One item of note: the commit fest managers are going to be much more vigilant about making sure that if you have signed up for a review you will work on it right away, and about removing reviewers who are not producing reviews. So you will be able to have much more confidence that if someone is signed up as a reviewer for a patch they will actually be doing the work.

After the meeting and the obligatory visit to the Royal Oak, a number of us went out and had a pleasant Indian meal, and then I came back to the hotel, fixed a customer problem, and wrote up some slides for my proposed lightning talk. More on this later.

Now, on to the conference proper!

pg_rewind, a tool for resynchronizing after failover

Posted by Heikki Linnakangas in VMware on 2013-05-23 at 11:12:55

I’ve been hacking on a tool to allow resynchronizing an old master server after failover. Please take a look: https://github.com/vmware/pg_rewind.

PgCon Developer Meeting Concluded

Posted by Bruce Momjian in EnterpriseDB on 2013-05-22 at 21:45:01

We just concluded the PgCon Developer Meeting. The two big items from me were that EnterpriseDB has dedicated staff to start work on parallelizing Postgres queries, particularly in-memory sorts. I have previously expressed the importance (and complexity) of parallelism. Second, EnterpriseDB has also dedicated staff to help improve Pgpool-II. Pgpool is the swiss army knife of replication tools, and I am hopeful that additional development work will further increase its popularity.

The Developer Meeting meeting notes (summary) has lots of additional information about the big things coming from everyone next year.

Postgres 9.3 feature highlight: new verbose error fields

Posted by Michael Paquier in VMware on 2013-05-22 at 18:28:32

PostgreSQL is already pretty useful for application developers when returning to client error messages by providing a certain level of details with multiple distinct fields like the position of the code where the error occurred. However this was lacking with the database object names, forcing the client application to deparse the error string returned by [...]

Kostal Pico to PostgreSQL

Posted by Hans-Juergen Schoenig in Cybertec on 2013-05-22 at 10:10:57

Everybody needs a little toy to play with, so I thought: Why not buy a toy helping me to get rid of my energy bill? So, I ordered a 10.5 kwp photovoltaic system for my house. The system was shipped with a Kostal Pico inverted to make sure electricity can be used directly by the [...]

How to do Postgres Replication and Failover in VMware vFabric Data Director 2.7?

Posted by Jignesh Shah in VMware on 2013-05-22 at 06:33:33

Last week VMware released vFabric Data Director 2.7. Among the many new features for various database, I wish to give a little more insight into my favorite ones which are regarding Postgres.

One of the big feature add from a broad perspective is support of Postgres 9.2 based managed database servers along with replication. Lets look at how it is done in brief.

First vFabric Postgres 9.2 base DBVM needs to be uploaded into Data Director system resource pool and then converted into template. Note this template is different from vFabric Postgres 9.2 Virtual appliance and is available on the same location where Data Director 2.7 download exists. Also this DBVM template is based on the latest PostgreSQL 9.2.4 core.

Next once the template is loaded into the system resource pool it will show up in Base DBVMs section in System->Manage and Monor-> Templates-> BaseDBVMs

Then you would then right click on the base DBVM and select “Convert to Base DB Template”. Here you also have a new feature to add more disks (think Tablespaces in PostgreSQL) to the template. In Data Director 2.7, disks are added at the template level.

This process take some time and then it should show up in Base DB Templates. On the Base DB Template section right click on the newly created template and select “Validate”. This one creates a test database to see if it is successful or not before it can be rolled out to the Organizations and Database groups . Once the validation is a success, right click and select “Assign it to the resource bundle” and select the resource bundle that will have access to the template or create a new resource bundle to use the template.

Note in the Resource bundle creation steps lies yet another new way to separate out IO on separate datastores since these may need different IO characteristics

The above image shows how the various types of datastores for your OS, Backup, Data and Logs (Data can be multiple location if you need multiple tablespaces)

Now all the Orgs/Database groups using the resource bundle

[continue reading]

Inserting JSON data into Postgres using JDBC driver

Posted by Denish Patel in OmniTI on 2013-05-22 at 01:22:55

One of the clients of OmniTI requested help to provide sample application to insert JSON data into Postgres using Java JDBC driver . I’m not Java expert so it took a while for me to write a simple java code to insert data. TBH, I took help to write test application from one of our Java engineers at OmniTI. Now, test application is ready and next step is to make it work with JSON datatype ! After struggling a little to find out work around for string escaping in JAVA code, I stumbled upon data type issue! Here is the test application code to connect to my local Postgres installation and insert JSON data into sample table:
postgres=# \d sample Table "public.sample" Column | Type | Modifiers --------+---------+----------- id | integer | data | json | denishs-MacBook-Air-2:java denish$ java -cp $CLASSPATH PgJSONExample -------- PostgreSQL JDBC Connection Testing ------------ PostgreSQL JDBC Driver Registered! You made it, take control your database now! Something exploded running the insert: ERROR: column "data" is of type json but expression is of type character varying Hint: You will need to rewrite or cast the expression. Position: 42

After some research , I found out that there is no standard JSON type on java side so adding support for json to postgres jdbc is not straight forward ! StackOverflow answer helped me for testing out the JSON datatype handling at psql level. As Craig mentioned in the answer that the correct way to solve this problem is to write a custom Java mapping type that uses the JDBC setObject method. This can be a tricky though. A simpler workaround is to tell PostgreSQL to cast implicitly from text to json:
postgres=# create cast (text as json) without function as implicit; CREATE CAST

The WITHOUT FUNCTION clause is used because text and json have the same on-disk and in-memory representation, they’re basically just aliases for the same data type. AS IMPLICIT tells PostgreSQL it can convert without being explicitly told to, allowing things like this to work:
postgres=# prepare test(tex

[continue reading]

KNN GIST with a Lateral twist: Coming soon to a database near you

Posted by Leo Hsu and Regina Obe on 2013-05-22 at 00:35:00

One of the things that really frustrated me about the KNN GIST distance box box centroid operators that came in PostgreSQL 9.1 and PostGIS 2.0 was the fact that one of the elements needed to be constant to take advantage of the index. In PostGIS speak, this meant you couldn't put it in the FROM clause and could only enjoy it in one of two ways.

Continue reading "KNN GIST with a Lateral twist: Coming soon to a database near you"

"Instant PostgreSQL Starter" review

Posted by Ian Barwick on 2013-05-21 at 15:35:00

Having recently posted some thoughts on Shaun Thomas' " "PostgreSQL Backup and Restore How-to" review ", Packt asked me if I'd like to review the new " Instant PostgreSQL Starter " by Daniel K. Lyons and kindly provided me with access to the ebook version. As I'm happily in a situation where I may need to introduce PostgreSQL to new users, I was interested in taking a look and here's a quick overview.

It follows the same "Instant" format as the backup booklet, which I quite like as it provides a useful way of focussing on particular aspects of PostgreSQL without being bogged down in reams of tl;dr documentation. " Instant Pg Starter " is divided into three sections:

Installation Quick start – creating your first table Top 9 features you need to know about

more...

Winning (Free eBooks) is Everything

Posted by Shaun M. Thomas on 2013-05-21 at 14:43:40

It occurs to me I forgot to congratulate the winners of the free ebooks. So without further adieu:

SAB, who seems to host a nice blog geared toward server administration.
Stephan, who’s looking to improve existing strategies.
Jeff and his growing PostgreSQL cluster.
Pierre, who apparently has an experimental PostgreSQL backend for MySQL. Interesting.

Congrats to the winners. But more, I call upon them to pay it forward by contributing to the community, either by corresponding with the excellent PostgreSQL mailing lists, or maybe submitting a patch or two to the code. There’s a lot of ground to cover, and more warm bodies always helps.

Thanks again, everyone!

PostgreSQL New Development Priorities 5: New User Experience

Posted by Josh Berkus in pgExperts on 2013-05-21 at 12:42:11

So, I started this looking for our five major goals for future PostgreSQL develoment. The last goal is more nebulous, but I think equally important with the other goals. It's this: improve the "new user experience".

This is not a new goal, in some ways. Improving installation, one of our previous 5 goals, was really about improving the experience for new users. But the new user experience goes beyond installation now, and competition has "raised the bar". That is, we matched MySQL, but now that's not good enough; we need to match the new databases. It should be as easy to get started on a dev database with PostgreSQL as it is with, for example, Redis. Let me give you a summary of the steps to get up, running, and developing an application in the two platforms:

Redis:

install Redis, either from packages or multiplatform binaries. No root access is required for the binaries.
read a 1-page tutorial
run redis-server
run redis-cli or install drivers for your programming language
start developing
when your app works, deploy to production
in production, tune how much RAM Redis gets.

PostgreSQL:

install PostgreSQL from packages or the one-click installer. Root/Admin access is usually required.
search the documentation to figure out how to get started.
figure out whether or not your packages automatically start Postgres. If not, figure out how to start it. This may require root access.
Install drivers for your programming language.
Figure out how to connect to PostgreSQL. This may require making changes to configuration files.
Read more pages of documentation to learn the basics of PostgreSQL's variety of SQL, or how to program an ORM which works with PostgreSQL.
Start developing.
Deploy to production.
Read 20 pages of documentation, plus numerous blogs, wiki pages and online presentations in order to figure out how to tune PostgreSQL.
Tune PostgreSQL for production workload. Be unsure if you've done it right.

The unfortunate reality is that a new user will hit a lot of points in the "getting to know

[continue reading]

Video Interview

Posted by Bruce Momjian in EnterpriseDB on 2013-05-21 at 02:00:01

I did a video interview while I was at ConFoo, and it is now online.