Sharding or horizontal scalability is a popular topic, discussed widely on PostgreSQL mailing lists these days. When we started the Postgres-XC project back in 2010, not everyone was convinced that we need a multi-node PostgreSQL cluster that scales with increasing demand. Even more likely, we, the PostgreSQL community, were skeptical about whether we have enough resources to design and implement a complex sharding architecture. Also, there was so much work that we could have done for vertical scalability and the obvious choice for many was to focus on the vertical scalability aspects of performance.
With great work that Robert Haas, Andres Freund and others have done in the last few releases, law of marginal utility is now catching up with the vertical scalability enhancements. Now, almost 6 years later, it’s becoming quite clear that horizontal scalability is a must to meet the demand of the users who very much like to use RDBMS for managing and querying their terabytes of data. Fortunately, Postgres-XC and Postgres-XL community have already proved the merits of a sharding solution based on the chosen design. These products have also matured over the last few years and are ready for public consumption. Of course, in the long run we all, including the Postgres-XL community, would like to push most of these features back into core PostgreSQL. But this is a multi-year, multi-release effort and until such time, the Postgres-XL community is committed to support, maintain and even enhance the product.
Here is a short list of features, in no particular order, that IMHO any sharding or horizontal scalability solution should support.
- Transparent and efficient placement of data for query optimisation
- On-demand addition and removal of nodes
- Scalabale components
- Guaranteeing transaction ACID properties all the time on all the nodes
- Parallel execution of queries on nodes
- Offloading execution to the remote nodes
- Connection pooling between various nodes
- UI to configure, monitor and manage the shards and other components
- High availability
- Node membership
Anything else?
A few of these features have already made it to core PostgreSQL, may be in a different form. Some of them may have been inspired by the work that was previously done as part of the Postgres-XC or Postgres-XL projects. But there are a whole bunch of things that we need to work on and get in a committable shape. I’m sincerely hoping that we start working in that direction sooner than later, leveraging the knowledge and the technology developed as part various sharding solutions.