There have been a number of articles on the upcoming improvements in Postgres v11, including one by EnterpriseDB’s own Bruce Momjian. A couple of the most commonly highlighted areas are in parallelism and partitioning.
Something I haven’t seen discussed much, however, is at the intersection of these two capabilities. In particular, Postgres v11 will introduce a more effective strategy for how workers are allocated when executing queries that span database partitions.
Here's how it works.
In Postgres v10 if you execute a query over a table that spans a number of partitions, you can get a plan like this:
-> Parallel Sequential Scan on partition1
-> Parallel Sequential Scan on partition2
-> Parallel Sequential Scan on partition3
When this query is executed, all of the available workers will start scanning partition1 in parallel. When there are no more blocks left in partition1 the next worker that needs a block will get the first block from partition2. This is certainly an improvement over earlier releases of Postgres since you now have multiple workers executing a parallel plan that covers all the partitions. However, while it is scanning each partition in parallel, it is largely scanning one partition at a time with a little bit of overlapping when it moves from one partition to the next.
In contrast, with Postgres v11, you will get a plan where the "Append" in the above example is replaced by "Parallel Append”. This will cause the workers to spread out across the partitions. For example if you have 4 workers, one of the partitions will end up being scanned by 2 workers, and the other 2 partitions will end up with 1 worker each. When you finish with one partition, the worker or workers allocated to that partition will be spread out across those that remain so that the total number of workers per partition remains as even as possible.
The advantage of the v11 strategy is that it avoids CPU contention and enhances I/O parallelism when database partitions are placed on separate storage volumes. This may seem like a small change, but for some workloads it has the potential to have a significant impact on performance when the data partitions are spread across a number of physical devices.
I will note that it is possible for the v11 strategy to perform worse. For example, if all of the partitions were placed on the same magnetic disk, you could end up thrashing the disk head all over the place with nonstop seeks instead of letting it read sequentially. While best practices would distribute partitions across physical media for performance sensitive workloads, there is a kill switch for parallel append available just in case.
Every #TechTuesday, EnterpriseDB shares a how-to post authored by a Postgres contributor and expert for Postgres Gems, the PostgresRocks community forum. PostgresRocks is a community to discuss all things Postgres. Join us at PostgresRocks and be part of the conversation.
Robert is Chief Architect, Database Server, employed at EnterpriseDB as well as a PostgreSQL Committer. Robert is an expert in OLTP query tuning, schema design, triggers and stored procedures, and internals development, as well as an experienced UNIX/Linux system administrator. Additionally, Robert is a full-life-cycle web developer with skills in needs analysis, application design, database schema design, user interface design, development, testing, user training and acceptance testing, and maintenance.