In a talk which I gave at PGCONF.IN and, in a shorter version, at PGCONF.US, I had a few slides on who contributes to PostgreSQL development. Here, I'd like to present a slightly expanded version of the information which was in the talk. The information in this post considers calendar year 2016 and comes from two sources.
VP, Chief Architect, Database Server @ EnterpriseDB, PostgreSQL Major Contributor and Committer
Showing posts with label postgresql. Show all posts
Showing posts with label postgresql. Show all posts
Wednesday, April 26, 2017
Saturday, April 08, 2017
New Features Coming in PostgreSQL 10
The list of new features coming in PostgreSQL 10 is extremely impressive. I've been involved in the PostgreSQL project since the 8.4 release cycle (2008-2009), and I've never seen anything like this. Many people have already blogged about these features elsewhere; my purpose here is just to bring together a list of the features that, in my opinion, are the biggest new things that we can expect to see in PostgreSQL 10. [Disclaimers: (1) Other people may have different opinions. (2) It is not impossible that some patches could be reverted prior to release. (3) The list below represents the work of the entire PostgreSQL community, not specifically me or EnterpriseDB, and I have no intention of taking credit for anyone else's work.]
Tuesday, March 14, 2017
Parallel Query v2
A recent Twitter poll asked What is your favorite upcoming feature of PostgreSQL V10? In this admittedly unscientific survey, "better parallelism" (37%) beat out "logical replication" (32%) and "native partitioning" (31%). I think it's fruitless to argue about which of those features is actually most important; the real point is that all of those are amazing features, and PostgreSQL 10 is on track to be an amazing release. There are a number of already-committed or likely-to-be-committed features which in any other release would qualify as headline features, but in this release they'll have to fight it out with the ones mentioned above.
Tuesday, August 02, 2016
Uber's move away from PostgreSQL
Last week, a blog post by an Uber engineer explained why Uber chose to move from PostgreSQL to MySQL. This article was widely reported and discussed within the PostgreSQL community, with many users and
developers expressing the clear sentiment that Uber had indeed touched
on some areas where PostgreSQL has room for improvement. I share that
sentiment. I believe that PostgreSQL is a very good database, but I
also believe there are plenty of things about it that can be improved.
When users - especially well-known names like Uber - explain what did
and did not work in their environment, that helps the PostgreSQL
community, and the companies which employ many of its active
developers, figure out what things are most in need of improvement. I'm happy to see the PostgreSQL community, of which I am a member, reacting to this in such a thoughtful and considered way.
Thursday, May 26, 2016
PostgreSQL Regression Test Coverage
Yesterday evening, I ran the PostgreSQL regression tests (make check-world) on master and on each supported back-branch three times on hydra, a community test machine provided by IBM. Here are the median results:
9.1 - 3m49.942s
9.2 - 5m17.278s
9.3 - 6m36.609s
9.4 - 9m48.211s
9.5 - 8m58.544s
master, or 9.6 - 13m16.762s
9.1 - 3m49.942s
9.2 - 5m17.278s
9.3 - 6m36.609s
9.4 - 9m48.211s
9.5 - 8m58.544s
master, or 9.6 - 13m16.762s
Thursday, April 21, 2016
PostgreSQL 9.6 with Parallel Query vs. TPC-H
I decided to try out parallel query, as implemented in PostgreSQL 9.6devel, on the TPC-H queries. To do this, I followed the directions at https://github.com/tvondra/pg_tpch - thanks to Tomas Vondra for those instructions. I did the test on an IBM POWER7 server provided to the PostgreSQL community by IBM. I scaled the database to use 10GB of input data; the resulting database size was 22GB, of which 8GB was indexes. I tried out each query just once without really tuning the database at all, except for increasing shared_buffers to 8GB. Then I tested them again after enabling parallel query by configuring max_parallel_degree = 4.
Monday, March 21, 2016
Parallel Query Is Getting Better And Better
Back in early November, I reported that the first version of parallel sequential scan had been committed to PostgreSQL 9.6. I'm pleased to report that a number of significant enhancements have been made since then. Of those, the two that are by the far the most important are that we now support parallel joins and parallel aggregation - which means that the range of queries that can benefit from parallelism is now far broader than just sequential scans.
Thursday, March 10, 2016
No More Full-Table Vacuums
I just committed a very important patch to PostgreSQL. The short summary for the patch is "Don't vacuum all-frozen pages." and it follows up on a patch I committed last week, whose short summary was "Change the format of the VM fork to add a second bit per page." This led Andres Freund to respond with a one word email: "Yeha!"
Tuesday, January 12, 2016
PostgreSQL Past, Present, and Future: Moving The Goalposts
It's nice to see that PostgreSQL 9.5 is finally released! There are a number of blog posts out about that already, not to mention stories in InfoWorld, V3, and a host of other publications. Of all the publicity, though, I think my favorite piece is a retrospective post by Shaun Thomas reviewing how far PostgreSQL has come over the last five years. As Shaun notes, both the scalability and the feature set of PostgreSQL have increased enormously over the last five years. It's easy to miss when you look one release, even (as I do) one commit at a time, but the place where we are now is a whole new world compared to where we were back then.
Wednesday, November 11, 2015
Parallel Sequential Scan is Committed!
I previously suggested that we might be able to get parallel sequential scan committed to PostgreSQL 9.5. That did not happen. However, I'm pleased to report that I've just committed the first version of parallel sequential scan to PostgreSQL's master branch, with a view toward having it included in the upcoming PostgreSQL 9.6 release.
Friday, October 30, 2015
Planning Parallel and Distributed Queries
I have been somewhat lax about blogging for the last six months or so due to having been even busier than usual with various projects, and I think that's likely to continue for at least the next month or two as I work to finish the first version of parallel query for PostgreSQL. If you have been following the PostgreSQL commit log recently, you will have noticed many new commits building up towards that goal.
However, I wanted to take a minute to point out the presentation that I did yesterday at 2015.pgconf.eu, which I have now uploaded to my presentations web site. The title of the presentation is "Planning Parallel and Distributed Queries". If you have not closely followed the development of parallel query, you might find this presentation interesting to review, because it gives examples of the types of query plans I hope that PostgreSQL will be able to generate in the future.
(Everything in the talk represents future work ... and not all of it will be in 9.6!)
However, I wanted to take a minute to point out the presentation that I did yesterday at 2015.pgconf.eu, which I have now uploaded to my presentations web site. The title of the presentation is "Planning Parallel and Distributed Queries". If you have not closely followed the development of parallel query, you might find this presentation interesting to review, because it gives examples of the types of query plans I hope that PostgreSQL will be able to generate in the future.
(Everything in the talk represents future work ... and not all of it will be in 9.6!)
Tuesday, March 31, 2015
PostgreSQL Shutdown
PostgreSQL has three shutdown modes: smart, fast, and immediate. For many years, the default has been "smart", but Bruce Momjian has just committed a patch to change the default to "fast" for PostgreSQL 9.5. In my opinion, this is a good thing; I have complained about the current, and agreed with others complaining about it, many times, at least as far back as December of 2010. Fortunately, we now seem to have now achieved consensus on this change.
Wednesday, March 18, 2015
Parallel Sequential Scan for PostgreSQL 9.5
Amit Kapila and I have been working very hard to make parallel sequential scan ready to commit to PostgreSQL 9.5. It is not all there yet, but we are making very good progress. I'm very grateful to everyone in the PostgreSQL community who has helped us with review and testing, and I hope that more people will join the effort. Getting a feature of this size and complexity completed is obviously a huge undertaking, and a significant amount of work remains to be done. Not a whole lot of brand-new code remains to be written, I hope, but there are known issues with the existing patches where we need to improve the code, and I'm sure there are also bugs we haven't found yet.
Monday, December 22, 2014
Parallelism Update
It's been over a year since I last blogged about parallelism, so I think I'm past due for an update, especially because some exciting things are happening.
First, Amit Kapila has published a draft patch for parallel sequential scan. Many things remain to be improved about this patch, which is neither as robust as it needs to be nor as performant as we'd like it to be nor as well-modularized as it really should be. But it exists, and it passes simple tests, and that is a big step forward. Even better, on most of Amit's tests, it shows a very substantial speed-up over a non-parallel sequential scan.
First, Amit Kapila has published a draft patch for parallel sequential scan. Many things remain to be improved about this patch, which is neither as robust as it needs to be nor as performant as we'd like it to be nor as well-modularized as it really should be. But it exists, and it passes simple tests, and that is a big step forward. Even better, on most of Amit's tests, it shows a very substantial speed-up over a non-parallel sequential scan.
Wednesday, August 06, 2014
Memory Matters
Database performance and hardware selection are complicated topics, and a great deal has been written on that topic over the years by many very smart people, like Greg Smith, who wrote a whole book about PostgreSQL performance. In many cases, the answers to performance questions require deep understanding of software and hardware characteristics and careful study and planning.
But sometimes the explanation is something very simple, such as "you don't have enough memory".
But sometimes the explanation is something very simple, such as "you don't have enough memory".
Tuesday, June 10, 2014
Linux disables vm.zone_reclaim_mode by default
Last week, Linus Torvalds merged a Linux kernel commit from Mel Gorman disabling vm.zone_reclaim_mode by default. I mentioned that this change might be in the works when I blogged about attending LSF/MM and again when I blogged about how the page cache may not behave quite the way we want even with vm.zone_reclaim_mode disabled.
For those who haven't read previous discussion on this topic, either on my blog, on pgsql-performance, or elsewhere around the Internet, enabling vm.zone_reclaim_mode can cause a lot of problems for applications, such as PostgreSQL, that make use of more page cache than will fit on a single NUMA node. Pages may get evicted from memory in preference to using memory on other nodes, effectively resulting in a page cache that is much smaller than available free memory. See the second of the two blog posts linked above for more details.
PostgreSQL isn't the only application that suffers from non-zero values of this setting, so I think a lot of people will be happy to see this change merged (like the guy who said that this setting is the essence of all evil). It will doubtless take some time for this to make its way into mainstream Linux distributions, but getting the upstream change made is the first step. Thanks to Mel Gorman for pursuing this.
For those who haven't read previous discussion on this topic, either on my blog, on pgsql-performance, or elsewhere around the Internet, enabling vm.zone_reclaim_mode can cause a lot of problems for applications, such as PostgreSQL, that make use of more page cache than will fit on a single NUMA node. Pages may get evicted from memory in preference to using memory on other nodes, effectively resulting in a page cache that is much smaller than available free memory. See the second of the two blog posts linked above for more details.
PostgreSQL isn't the only application that suffers from non-zero values of this setting, so I think a lot of people will be happy to see this change merged (like the guy who said that this setting is the essence of all evil). It will doubtless take some time for this to make its way into mainstream Linux distributions, but getting the upstream change made is the first step. Thanks to Mel Gorman for pursuing this.
Tuesday, May 13, 2014
Troubleshooting Database Corruption
When your database gets corrupted, one of the most important things to do is figure out why that happened, so that you can try to ensure that it doesn't happen again. After all, there's little point in going to a lot of trouble to restore a corrupt database from backup, or in attempting to repair the damage, if it's just going to get corrupted again. However, there are times when root cause analysis must take a back seat to getting your database back on line.
Wednesday, April 16, 2014
Why The Clock is Ticking for MongoDB
Last month, ZDNet published an interview with MongoDB CEO Max Schireson which took the position that the document databases, such as MongoDB, are better-suited to today's applications than traditional relational databases; the title of the article implies that the days of relational databases are numbered. But it is not, as Schireson would have us believe, that the relational database community is ignorant of or has not tried the design paradigms which he advocates, but that they have been tried and found, in many cases, to be anti-patterns. Certainly, there are some cases in which the schemaless design pattern that is perhaps MongoDB's most distinctive feature is just the right tool for the job, but it is also misleading to think that such designs must use a document store. Relational databases can also handle such workloads, and their capabilities in this area are improving rapidly.
Wednesday, April 02, 2014
Subtly Bad Things Linux May Be Doing To PostgreSQL
In addition to talking about PostgreSQL at LSF/MM and Collab, I also learned a few things about the Linux kernel that I had not known before, some of which could have implications for PostgreSQL performance. These are issues which I haven't heard discussed before in the PostgreSQL community, and they are somewhat subtle, so I thought it would be worth writing about them.
Monday, March 31, 2014
Back from LSF/MM and Collab
Last week, I attended the Linux Storage, Filesystems, and Memory Management summit (LSF/MM) on Monday and Tuesday, and the Linux Collaboration Summit (aka Collab) from Wednesday through Friday. Both events were held at the Meritage Resort in Napa, CA. This was by invitation of some Linux developers who wanted to find out more about what PostgreSQL needs from the Linux kernel. Andres Freund and I attended on behalf of the PostgreSQL community; Josh Berkus was present for part of the time as well.
My overall impression is that it was a good week, except that by Thursday the combination of 14 hour days and jet lag were catching up with me in a big way. However, from the point of view of the PostgreSQL project, I think it was very positive. On Monday, Andres and I had an hour-and-a-half slot; we used about an hour and fifteen minutes of that time. Our big complaint was with the Linux kernel's fsync behavior, but we talked about some other issues as well, including double buffering, transparent huge pages, and zone reclaim mode.
My overall impression is that it was a good week, except that by Thursday the combination of 14 hour days and jet lag were catching up with me in a big way. However, from the point of view of the PostgreSQL project, I think it was very positive. On Monday, Andres and I had an hour-and-a-half slot; we used about an hour and fifteen minutes of that time. Our big complaint was with the Linux kernel's fsync behavior, but we talked about some other issues as well, including double buffering, transparent huge pages, and zone reclaim mode.
Subscribe to:
Posts (Atom)