Row Movement Across PostgreSQL Partitions Made Easy

amit.khandekar_enterprisedb.com's picture
Author: Amit Khandekar
5/15/2018

Before declarative partitioning was introduced in PostgreSQL 10, one had to use table inheritance. One of the main drawbacks of using table inheritance is that ROW triggers are needed for the parent table to automatically route the inserted row to the right child table, which is cumbersome and slow. With declarative partitioning, when you insert a row into a partitioned table, that row is transparently routed to the right partition without the need for triggers.

But what if a row is updated such that it no longer fits in the current partition? In PostgreSQL 10, we get a constraint violation error, as shown below:

postgres=# CREATE TABLE emp_root(id int, dept varchar, location int) PARTITION BY LIST(dept);

postgres=# CREATE TABLE emp_subroot PARTITION OF emp_root FOR VALUES IN ('dept1', 'dept2') PARTITION BY RANGE(location);

postgres=# CREATE TABLE subroot_part_1 PARTITION OF emp_subroot FOR VALUES FROM (1) to (10);

postgres=# CREATE TABLE subroot_part_2 PARTITION OF emp_subroot FOR VALUES FROM (11) to (20);

postgres=# CREATE TABLE emp_subroot2 PARTITION OF emp_root FOR VALUES IN ('dept3', 'dept4');

postgres=#

postgres=# INSERT INTO emp_root values (1, 'dept1', 5), (2, 'dept2', 16), (3, 'dept3', 7), (4, 'dept4', 17);

postgres=# SELECT tableoid::regclass, * from emp_root ORDER BY id;

   tableoid    | id | dept | location

----------------+----+-------+----------

subroot_part_1 |  1 | dept1 | 5

subroot_part_2 |  2 | dept2 | 16

emp_subroot2   | 3 | dept3 |      7

emp_subroot2   | 4 | dept4 |     17

(4 rows)

 

postgres=# UPDATE emp_root SET dept = 'dept1' WHERE id = 4;

ERROR:  new row for relation "emp_subroot2" violates partition constraint

DETAIL:  Failing row contains (4, dept1, 17).

 

Above, we are updating the partition key 'dept' of a row in partition emp_subroot2, and the new value 'dept1' does not qualify the row for that partition, but it qualifies for partition subroot_part_2.

Again, if we want to automate the row movement, we need to rely on triggers that would then route it to the right partition.

But no longer in PostgreSQL 11. If the row can be accommodated into another partition, it simply moves there. So in the above case, the row moves into partition subroot_part_2, as shown below:

postgres=# UPDATE emp_root SET dept = 'dept1' WHERE id = 4;

postgres=# SELECT tableoid::regclass, * from emp_root ORDER BY id;

   tableoid    | id | dept | location

----------------+----+-------+----------

subroot_part_1 |  1 | dept1 |  5

subroot_part_2 |  2 | dept2 | 16

emp_subroot2   | 3 | dept3 |   7

subroot_part_2 |  4 | dept1 | 17

(4 rows)

As you can see, this is a practical scenario where an employee changes his/her department, and the tables are partitioned by department, so the row has to move to a different partition.

Scope of row movement

Note that the row movement is attempted only within the partition tree under the partitioned table specified in the UPDATE statement. For example, the below UPDATE fails because the updated row does not accommodate into any of the partitions under emp_subroot, although it can fit into emp_subroot2 which is outside of emp_subroot tree:

postgres=# UPDATE emp_subroot SET dept = 'dept4' WHERE id = 1;

ERROR:  new row for relation "subroot_part_1" violates partition constraint

DETAIL:  Failing row contains (1, dept4, 5).

Behind the scenes, it's a DELETE and INSERT

The row movement is internally achieved by deleting the original row and inserting the modified row into the right partition. Users have to be aware of this behavior, because its implications may sound surprising in case of a few scenarios as described below:
 

  1. Row Triggers defined on the partitions.

    Since the DELETE happens on the source partition, any ON DELETE ROW trigger defined for that partition will be executed, just like it would on a normal table. The same applies for INSERT triggers on the destination partition. It gets trickier when a BEFORE UPDATE row trigger is defined on the source partition. This trigger may even update the partition key and result in row movement. Further, once the UPDATE is converted into a DELETE and INSERT, all applicable UPDATE triggers are skipped; namely AFTER UPDATE row triggers in the source partition, and BEFORE/AFTER UPDATE row triggers on the destination partition.

  2. Concurrency anomalies

     There is a concurrency scenario where an UPDATE or a DELETE on a partition is about to update/delete a row, but before that happens, the row concurrently gets deleted by another operation that involved UPDATE of its partition-key. In such a case, the former operation returns an error that looks something like this:

    ERROR: tuple to be locked was already moved to another partition due to concurrent update 

  3. Oids change

    Since the original and the new row belong to different partitions, the OID of the row changes. This does not happen for an UPDATE that does not move the row. 

  4. Default partition

    If the updated row cannot be accommodated into any of the partitions, and if there is a default partition present, that row moves into the default partition.

UPDATE row movement with other features

The above explained trigger behavior sounds surprising, but note that it behaves as such because the triggers are defined on partitions, and not on the partitioned table that is used in the UPDATE statement. Instead, if we consider an UPDATE statement trigger on the partitioned table, that trigger does execute even when the row is moved. This is because the trigger is defined on the partitioned table, rather than the underlying partitions.

Similarly, there are other objects that can be defined for an UPDATE operation on a partitioned table, and they continue to behave without change even while there is row movement happening underneath. For instance, consider a Row Security Policy defined for UPDATE on the root partitioned table:

postgres=# CREATE USER user1;

postgres=# ALTER TABLE emp_root ENABLE ROW LEVEL SECURITY;

postgres=# GRANT ALL ON emp_root TO user1;

postgres=# CREATE POLICY seeall ON emp_root AS PERMISSIVE FOR SELECT USING (true);

postgres=# CREATE POLICY update_id ON emp_root for UPDATE USING (true) WITH CHECK (id <= 50);

This policy does not allow existing rows to change their id column to a value greater than 50. The following UPDATE is supposed to move a row into a partition that allows dept = ‘dept3’. It returns an error because the id column value is updated to 51.
 

postgres=# \c postgres user1;

postgres=> UPDATE emp_root SET dept = 'dept3', id = 51  WHERE id = 1;

ERROR:  new row violates row-level security policy for table "emp_root"
 

So the UPDATE policy gets applied even though, behind the scenes, it is a DELETE and INSERT. This is because the policy belongs to the table that is specified in the UPDATE statement. It does not belong to any of the underlying partitions. 

On similar lines, transition tables referenced in UPDATE statement triggers defined on a partitioned table, will continue to show the OLD TABLE and NEW TABLE rows even when some of the rows were moved across partitions.

Contributions

While this feature was authored by me, there were significant design inputs and suggestions given by Robert Haas. My work went through a significant review cycle, thanks to Amit Langote and David Rowley. It was further reviewed by Amit Kapila, Dilip Kumar, Thomas Munro and Álvaro Herrera, and tested by Rajkumar Raghuwanshi.

Amit Khandekar is a Technical Architect at EnterpriseDB.