It's nice to see that PostgreSQL 9.5 is finally released! There are a number of blog posts out about that already, not to mention stories in InfoWorld, V3, and a host of other publications. Of all the publicity, though, I think my favorite piece is a retrospective post by Shaun Thomas reviewing how far PostgreSQL has come over the last five years. As Shaun notes, both the scalability and the feature set of PostgreSQL have increased enormously over the last five years. It's easy to miss when you look one release, even (as I do) one commit at a time, but the place where we are now is a whole new world compared to where we were back then.
Back in 2010, I wrote a blog post entitled Big Ideas and a follow-up post called Lots and Lots of PostgreSQL Feature Requests summarizing first the ideas that I had and then the responses that I heard from other people about what was missing from PostgreSQL. Five years later, many of the items on those lists somewhat dated. PostgreSQL 9.5 adds INSERT .. ON CONFLICT, which answers many of the use cases that led people to ask for merge; and it adds row-level security. Index-only scans, granular collation support, SE-Linux integration, LATERAL, updateable views, and materialized views are all features we've had for so long now that most PostgreSQL users probably don't even think of them as new features any more. Previous releases have also added new data types, particularly jsonb, which allows easy manipulation of JSON data and which has been further enhanced in PostgreSQL 9.5; as well as new index types like SP-GIST. PostgreSQL 9.5 adds BRIN indexes, part of a growing body of work to adapt PostgreSQL to progressively larger workloads.
Many of the really important features from those lists which haven't yet been released are well underway. For example, we don't have multi-master replication in core just yet, but enormous progress has been made with the inclusion of logical decoding in PostgreSQL 9.4, and there will in all likelihood be more progress on logical replication in PostgreSQL 9.6. A very simple version of parallel query has already been committed to PostgreSQL 9.6 and further enhancements are on the way. Work is also under way on partitioning syntax.
It's time, then, to look to the future. What are the major gaps that exist in PostgreSQL today, as opposed to five years ago? Specifically, once we get through the twin knotholes of parallel query and logical replication, long-overdue projects where the slow progress we've made is a direct result of just how very difficult it is to get them off the ground, what comes next? Leave a comment below with your thoughts.
I outlined some of my own ideas about this in a presentation called The Elephants in the Room, which I gave at both pgconf.us 2015 and pgconf.eu 2015. Both video and slides are on line. That presentation mentions both parallel query and logical replication, of course, plus a few other things:
1. Horizontal Scalability. While PostgreSQL 9.5 scales to large boxes better than any previous release (and probably worse than any future release, since we keep making improvements!), what happens when you need more than one server for your workload? PostgreSQL 9.5 has a little-noted feature to allow foreign tables to participate in table inheritance, which is a long way of spelling "sharding" if you tilt your head just right, but the query planner and executor capabilities to make it a really killer feature are not there yet. More generally, regardless of how we get there, leveraging one box effectively is good, but leveraging multiple boxes is better.
2. On-Disk Format. At each of the last two instances of PGCon (protip: you should go), there has been some discussion about making PostgreSQL's table storage pluggable, just as our indexing system has been for many years. This would open the door either to replacing PostgreSQL's storage format entirely with something better, without thereby breaking backward compatibility; or perhaps more likely, to introducing specialized storage formats which are better for certain applications. Storage formats which are more compact and therefore allow reading the same amount of useful data from the disk with less physical I/O seem particularly important.
3. Built-In Connection Pooling. I'm not sure how many users are out there using an external connection pool and wishing they could be rid of it, but I'll bet there are some.
The presentation also talks about direct I/O, but in some sense that's not really a feature from a user perspective. If somebody implements it and that turns out to be beneficial, the feature will be improved performance. Of course, one can never have enough performance, whatever the source.
Again, thoughts on other things PostgreSQL needs are very welcome. Please comment below on what else you think should be added. Thanks.
This blog originally appeared on Robert Haas' personal blog.
Robert Haas is Vice President, Chief Database Architect, at EnterpriseDB.