DDL Improvements in EDB Postgres Advanced Server: Building Parallel Indexes and Automatic Partitioning

Sebastiaan Mannem

December 04, 2020

PostgreSQL includes many great features built into the database engine. EDB Postgres Advanced Server builds on top of PostgreSQL to enhance and expand these capabilities and adds Oracle-like syntax to leverage them. Some examples of such capabilities include partitioning and parallelization. Partitioning can greatly reduce duration of queries, and the impact for maintenance when applied to large tables—when table vacuums take longer than the idle period between vacuums. Parallelization spreads tasks across multiple CPUs, allowing it to utilize more resources for the task, and bring down duration.

The latest release of EDB Postgres Advanced Server 13 adds some great DDL enhancements, which greatly simplifies and enhances both of these features. You can define DDL for specifying custom index settings, setting parallelization options per index, and use simplified syntax to define complex partition schemes with enhanced capabilities. The automatic capabilities reduce manual efforts for maintenance of the partitioning schemes, and the Oracle-like syntax lower migrational risks and efforts.

If you want to run DDL to create tables with automatic partitioning, configuration for parallel creation of indexes, and specifying custom indexes, then EDB Postgres Advanced Server 13 is just what you need.

So, what is it, how does it work, and why is it so great?

Custom Index Specifications with PostgreSQL

Specifying custom index options can be required for many reasons. A common reason is to store indexes on other tablespaces, which might be required to meet with company policy or to increase lookup performance by using faster storage for a typical index.
PostgreSQL syntax is unaware of Custom Index specification in table DDL, and as such, only indexes conforming to default options can be created as part of the CREATE TABLE statement. To create indexes with custom options, the constraint must be created with a separate statement.

Parallel Index Build in PostgreSQL

Building and maintaining indexes can be a task with a long runtime, and during that period of time, duration of queries could be severely impacted. PostgreSQL 11 introduced a feature called parallel index build, which allows building B-tree indexes with parallelization. This can greatly reduce the duration for building the indexes. The degree of parallelization is determined by the table-specific option parallel_workers, which can be set with the ALTER TABLE statement. Setting a different value for the parallel_workers option, would facilitate faster builds, or reduce resource usage. When not set, the value for parallel_workers defaults to the global configuration option maintenance_work_mem.

Example on Custom Index Specifications and Parallel Index Build in PostgreSQL

The following PostgreSQL syntax can be used to create an index with a parallelization of 2:

CREATE TABLE my_parallel_table

(

id INT,

country_code VARCHAR(5)

);

ALTER TABLE my_parallel_table SET (parallel_workers=2);

CREATE UNIQUE INDEX idx_uq_id ON my_parallel_table (id) TABLESPACE tbs1;

While the native PostgreSQL syntax for building a parallel index serves its purpose, you can see that it requires that the user first creates the table, then sets parallelization (table wide), and finally creates the index. Should it be required to create multiple indexes with different parallelization options, PostgreSQL syntax allows to repeat the last two statements with different options for every index, but since this is a table wide configuration, that change overwrites the config for all indexes in the long run. Setting the custom TABLESPACE option is a second reason to require a separate statement to create the UNIQUE INDEX.

Custom Index Specifications and Parallel Index Build in EDB Postgres Advanced Server

With EDB Postgres Advanced Server 13, we have added the following keywords to enhance this behaviour:

USING INDEX: This allows users to define any custom index as the basis for the constraint. Of course it needs to match (e.g. unique indexes should be used for unique constraints and primary keys), but it can have all kinds of customization, like specifying the tablespace and setting the (newly added) PARALLEL keyword for this index. See CREATE INDEX documentation for more information.

PARALLEL: This keyword leverages the infrastructure of PostgreSQL for parallel building indexes. The keyword sets the parallel_workers as a custom option on the index (where the PostgreSQL implementation sets it per table only). This also means that the parallel option can be set differently per index.

With these keywords, in EDB postgres Advanced Server, the PostgreSQL syntax example can be replaced by a single DDL statement:

CREATE TABLE my_parallel_table2

(

id INT,

country_code VARCHAR(5),

CONSTRAINT uq_id UNIQUE(id)

USING INDEX

(

CREATE UNIQUE INDEX idx_uq_id

ON my_parallel_table2 (id)

TABLESPACE tbs1 PARALLEL 2)

);

The USING INDEX keyword allows for specifying a custom index specification, and the PARALLEL keyword sets the parallel_workers option, which in EDB Postgres Advanced Server can be customized per index. And the icing on the cake is that this is in line with Oracle syntax, which is a great help during migrations too.

Partitioning with PostgreSQL:

Partitioning can greatly reduce duration of queries and impact on the system in multiple ways:

Partitioning can prune unnecessary partitions before further processing is done.
Queries might perform better because heavily-used parts of the indexes fit in memory.
Partitioning can also reduce the impact on maintenance when applied to large tables. The exact point at which a table will benefit from partitioning depends on the application, although a rule of thumb is that the size of the table should exceed the physical memory of the database server.
For tables holding a lot of (many billions) of tuples, vacuuming configuration can be difficult to set properly for the entire table. Splitting such tables into partitions can bring them back in proportion, making default work better already. Also vacuuming can be tuned per partition if required, and even with all optimizations, still vacuuming one partition has less impact than the entire table.
A nice side effect of partitioning is that vacuuming of the table is split in separate vacuum actions per partition, which is spread over time, and as such has less immediate impact than one vacuum of the entire table with a longer duration.
Multiple partitions can be vacuumed in parallel. PostgreSQL 13 has added parallel vacuum also.
Another use case could be for tables with high change rates. Usually they contain hotspots (frequently changed tuples), and usually they would be contained by only a few partitions. Should vacuuming be required, it would only impact that one partition, and use less resources and finish much sooner too. As such vacuuming them more often has less impact on the system and can improve overall query performance due to proper statistics as well as less IO requirements. When table vacuums take longer than the idle period between vacuums.
For data maintenance, bulk loads and deletes can be accomplished with much less impact by adding or removing partitions.

Partitioning should be part of the overall schema design, and the improvement should be tested in your specific use case. But implementing partitioning properly can be really beneficial.

PostgreSQL version 10 has introduced declarative partitioning as a syntax to define partitioning natively out of the box. EDB Postgres Advanced Server already delivered Partitioning since version 9.5. The options are:

Range Partitioning: The table is partitioned into “ranges” defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. For example, one might partition by date ranges, or by ranges of identifiers for particular business objects.
List Partitioning: The table is partitioned by explicitly listing which key values appear in each partition.
Hash Partitioning (v11): The table is partitioned by specifying a modulus and a remainder for each partition. Each partition will hold the rows for which the hash value of the partition key divided by the specified modulus will produce the specified remainder.

This is an example for creating a table with LIST partitioning in PostgreSQL:

CREATE TABLE list_tab

(

id INT,

country_code VARCHAR(5)

);

CREATE TABLE list_tab_ind partition OF list_tab FOR VALUES IN ('IND');

CREATE TABLE list_tab_usa partition OF list_tab FOR VALUES IN ('USA');

Note that with PostgreSQL syntax, creating the parent table and the partitions require separate statements.

Automatic List Partitioning with EDB Postgres Advanced Server:

EDB Postgres Advanced Server supports the PostgreSQL syntax, but adds support for Oracle syntax, and some additional enhancements. The following example creates a table with 2 partitions, one for country_code = 'IND', and one for country_code = 'USA':

CREATE TABLE list_tab

(

id INT,

country_code VARCHAR(5)

)

PARTITION BY LIST (country_code)

(

PARTITION p1 VALUES ('IND'),

PARTITION p2 VALUES ('USA')

);

The example defines a rather static setup with 2 partitions. Let's assume with the previous example, the business evolves beyond the USA and India, and the business starts to sell in the UK too. Inserting data for a country_code 'UK' would not succeed:

INSERT INTO list_tab

VALUES (1, 'UK');

The statement would raise the following error:

ERROR: no partition of relation "list_tab" found for row

DETAIL: Partition key of the failing row contains (country_code) = (UK).

Of course, the DBA could manually create and add new partitions, but it would be great if this could work automatically, without human intervention. This is where Automatic List Partitioning, added in EDB Postgres Advanced Server 13, comes in:

CREATE TABLE list_tab

(

id INT,

country_code VARCHAR(5)

)

PARTITION BY LIST (country_code) AUTOMATIC

(

PARTITION p1 VALUES ('IND'),

PARTITION p2 VALUES ('USA')

);

Due to the AUTOMATIC keyword, the statement inserting "UK" into the table would now succeed, and in the background a new partition would be created to store records with partition key FOR VALUES IN ('UK'). And just like that, inserts would succeed and partitions are automatically created in the background.

Automatic Hash Partitioning with EDB Postgres Advanced Server:

Next to List Partitioning, PostgreSQL also supports syntax to define Hash Partitioning natively. EDB Postgres Advanced Server also supports Oracle syntax for Hash Partitioning, and adds some more options. Below is an example of the EDB Postgres Advanced Server syntax for Hash Partitioning:

CREATE TABLE hash_tab

(

col1 NUMBER,

col2 NUMBER

)

PARTITION BY HASH (col1, col2)

(

PARTITION p1 TABLESPACE tbs1,

PARTITION p2 TABLESPACE tbs2,

PARTITION p3 TABLESPACE tbs3

);

The statement would create a table with 3 hash partitions. For 3 partitions this is very doable, but imagine creating one hundred partitions with such a statement. This is where the new Automatic Hash Partitioning syntax comes in. The following example uses Automatic Hash Partitioning to create 3 partitions using the PARTITIONS keyword:

CREATE TABLE hash_tab

(

col1 NUMBER,

col2 NUMBER

)

PARTITION BY HASH (col1, col2)

PARTITIONS 3 STORE IN (tbs1, tbs2, tbs3);

Automatically three partitions are created, and data is spread across the three partitions.

Combining Automatic Partitioning options with EDB Postgres Advanced Server:

Next to Automatic partitioning for list and hash partitions, EDB Postgres Advanced Server (beginning with version 12) also allows for Automatic Range partitioning (with interval partitioning being one of its use cases). And the partitioning options can even be combined, which is where the real power starts to come to play.

The following example would combine the earlier examples into a table with list type partitions and Hash type subpartitions:

CREATE TABLE list_hash_tab

(

country_code VARCHAR(5),

col1 NUMBER,

col2 NUMBER

)

PARTITION BY LIST (country_code) AUTOMATIC

SUBPARTITION BY HASH (col1, col2)

SUBPARTITIONS 3

(

PARTITION ind VALUES ('IND')

)

;

For this example, whenever a row is inserted with a new partition key (like 'USA', or 'UK', or 'NLD'), a new partition is automatically created to hold the data, and this partition will have three Hash type subpartitions to distribute the data over the tablespaces. Without any user intervention required.

Both migrating and maintaining such partitioned tables with subpartition schemes would be a huge effort, and with EDB Postgres Advanced Server 13 all of the required syntax and capabilities are part of the package.

Conclusion

EDB Postgres Advanced Server builds on top of PostgreSQL to enhance and expand functionality and adds Oracle-like syntax to leverage these functionalities. With the latest release of EDB Postgres Advanced Server 13, DDL enhancements allow you to define custom index settings and set parallelization options per index at table creation time. Complex partitioning schemes can be defined with simplified syntax and automatic capabilities which reduce manual efforts for maintenance. The DDL supports an Oracle-like syntax which lowers migration risk and effort.

2020 (C) EnterpriseDB Corporation, All Rights Reserved.

EDB, EDB Postgres, EDB Postgres Advanced Server are trademarks of EnterpriseDB Corporation and Oracle is a trademark of Oracle Corporation.

ALL PRODUCT, PRODUCT SPECIFICATIONS AND DATA ARE SUBJECT TO CHANGE WITHOUT NOTICE TO IMPROVE
RELIABILITY, FUNCTION OR DESIGN OR OTHERWISE.

EnterpriseDB Corporation, its affiliates, agents, and employees, and all persons acting on its or their behalf (collectively,
“EDB”), disclaim any and all liability for any errors, inaccuracies or incompleteness contained in any datasheet or in any other
disclosure relating to any product.

EDB makes no warranty, representation or guarantee regarding the suitability of its products for any particular purpose or
that it will continue to produce a particular product.
‘
Statements regarding the suitability of products for certain types of applications are based on EDB's knowledge of
typical requirements that are often placed on EDB products in generic applications. Such statements are not binding
statements about the suitability of products for a particular application. It is the customer’s responsibility to validate that a
particular product with the properties described in the product specification is suitable for use in a particular application.

In this Article

Custom Index Specifications with PostgreSQL
Parallel Index Build in PostgreSQL
Example on Custom Index Specifications and Para...
Custom Index Specifications and Parallel Index ...
Conclusion

Resource Feature Callout 1

DDL Improvements in EDB Postgres Advanced Server: Building Parallel Indexes and Automatic Partitioning

Sebastiaan Mannem

Custom Index Specifications with PostgreSQL

Parallel Index Build in PostgreSQL

Example on Custom Index Specifications and Parallel Index Build in PostgreSQL

Custom Index Specifications and Parallel Index Build in EDB Postgres Advanced Server

Conclusion