Declarative partitioning got some attention in the PostgreSQL 12 release, with some very handy features. There has been some pretty dramatic improvement in partition selection (especially when selecting from a few partitions out of a large set), referential integrity improvements, and introspection.
In this article, we’re going to tackle the referential integrity improvement first. This will provide some sample data to use later for the other explanations. And the feature is just amazingly cool, so it goes first anyway.
This example builds on the example given for the Generated columns in PostgreSQL 12 article, where we built a media calendar by calculating everything you ever wanted to know about a date. Here’s the short version of the code:
CREATE TABLE public.media_calendar (
gregorian date NOT NULL PRIMARY KEY,
month_int integer GENERATED ALWAYS AS (date_part('month'::text, gregorian)) STORED,
day_int integer GENERATED ALWAYS AS (date_part('day'::text, gregorian)) STORED,
year_int integer GENERATED ALWAYS AS (date_part('year'::text, gregorian)) STORED,
quarter_int integer GENERATED ALWAYS AS (date_part('quarter'::text, gregorian)) STORED,
dow_int integer GENERATED ALWAYS AS (date_part('dow'::text, gregorian)) STORED,
doy_int integer GENERATED ALWAYS AS (date_part('doy'::text, gregorian)) STORED
...snip...
);
INSERT INTO public.media_calendar (gregorian)
SELECT '1900-01-01'::date + x
-- Starting with 1900-01-01, fill the table with 200 years of data.
FROM generate_series(0,365*200) x;
Now, we’re going to add a time dimension to our model, and relate the date and time together for a 200 year calendar that’s accurately computed to the second.
CREATE TABLE time_dim (
time_of_day time without time zone not null primary key,
hour_of_day integer GENERATED ALWAYS AS (date_part('hour', time_of_day)) stored,
minute_of_day integer GENERATED ALWAYS AS (date_part('minute', time_of_day)) stored,
second_of_day integer GENERATED ALWAYS AS (date_part('second', time_of_day)) stored,
morning boolean GENERATED ALWAYS AS (date_part('hour',time_of_day)<12) stored,
afternoon boolean GENERATED ALWAYS AS (date_part('hour',time_of_day)>=12 AND date_part('hour',time_of_day)<18) stored,
evening boolean GENERATED ALWAYS AS (date_part('hour',time_of_day) >= 18) stored
);
INSERT INTO time_dim (time_of_day )
SELECT '00:00:00'::time + (x || ' seconds')::interval
FROM generate_series (0,24*60*60-1) x; -- start with midnight, add seconds in a day;
We should now have 86400 rows in the time dimension, and 73001 rows in our 200 year media calendar. Of course, when we decide to relate these together, a cartesian join produces a bit over 6 billion rows (6,307,286,400). The good news is that this table is unlikely to grow, unless Ceasar decides to add more days to the year, or the EU decides to add more seconds to a day (grumble, grumble). So, it makes a good candidate to partition, with a very easily calculated key.
CREATE TABLE hours_to_days (
day date not null references media_calendar(gregorian),
time_of_day time without time zone not null references time_dim(time_of_day),
full_date timestamp without time zone GENERATED ALWAYS AS (day + time_of_day) stored,
PRIMARY KEY (day,time_of_day)
) PARTITION BY RANGE (day);
CREATE INDEX idx_natural_time ON hours_to_days(full_date);
You just saw a new feature that was created in PostgreSQL 11 (not a typo, I mean 11). You may have a parent->child foreign key that references a partitioned table.
Ok, we were allowed to do that, so let’s get on with the PostgreSQL 12 partitioning lesson.
CREATE TABLE hours_to_days_ancient PARTITION OF hours_to_days
FOR VALUES FROM (minvalue) TO ('1990-01-01');
CREATE TABLE hours_to_days_sep PARTITION OF hours_to_days
FOR VALUES FROM ('2040-01-01') TO (maxvalue);
CREATE TABLE hours_to_days_1990 PARTITION OF hours_to_days
FOR VALUES FROM ('1990-01-01') TO ('2000-01-01');
CREATE TABLE hours_to_days_2000 PARTITION OF hours_to_days
FOR VALUES FROM ('2000-01-01') TO ('2010-01-01');
CREATE TABLE hours_to_days_2010 PARTITION OF hours_to_days
FOR VALUES FROM ('2010-01-01') TO ('2020-01-01');
CREATE TABLE hours_to_days_2020 PARTITION OF hours_to_days
FOR VALUES FROM ('2020-01-01') TO ('2030-01-01');
CREATE TABLE hours_to_days_2030 PARTITION OF hours_to_days
FOR VALUES FROM ('2030-01-01') TO ('2040-01-01');
Notice that the partitions do not have to be evenly distributed in the range, the data quantity, or any other criteria. The only requirement is that all dates are included in one (and only one) partition.
INSERT INTO hours_to_days (day, time_of_day)
SELECT gregorian, time_of_day
FROM media_calendar
CROSS JOIN time_dim;
Now, go get some coffee, because we’re going to get 6.3B rows.
Now, we’re finally going to get to the first PostgreSQL 12 enhancement. In the latest version of PostgreSQL, you may have a foreign key relationship where the partitioned table is the child.
CREATE TABLE sale (
id bigserial primary key,
transaction_date date not null default now()::date,
transaction_time time without time zone not null default date_trunc('seconds', now()::time),
FOREIGN KEY (transaction_date, transaction_time) REFERENCES hours_to_days(day,time_of_day)
);
Wow! Well, “”wow” for people who can get excited about code. This means that you can have a partitioned dimensional model! You can have partitioned OLAP! You can have partitioned geophysical data, or any other kind of data, without losing referential integrity. That’s big news to data modeling at the edge of the diagram.
Now let’s look at the partitions that we just created. How, you ask? Well, with the new introspection tools in PostgreSQL 12, of course. Those are:
pg_partition_tree, pg_partition_ancestors, pg_partition_root
Let’s explore those with the partitions we created.
When we look at our partitioned parent table, the results are underwhelming:
\d hours_to_days
Partitioned table "public.hours_to_days"
Column | Type | Collation | Nullable | Default
-------------+-----------------------------+-----------+----------+------------------------------------------------
day | date | | not null |
time_of_day | time without time zone | | not null |
full_date | timestamp without time zone | | | generated always as (day + time_of_day) stored
Partition key: RANGE (day)
Indexes:
"hours_to_days_pkey" PRIMARY KEY, btree (day, time_of_day)
"idx_natural_time" btree (full_date)
Foreign-key constraints:
"hours_to_days_day_fkey" FOREIGN KEY (day) REFERENCES media_calendar(gregorian)
"hours_to_days_time_of_day_fkey" FOREIGN KEY (time_of_day) REFERENCES time_dim(time_of_day)
Number of partitions: 7 (Use \d+ to list them.)
We see a bit of the partition info, but not anywhere near what we’d like to know. We get a bit more with enhancing:
\dS+ hours_to_days --<-- note the Splus
Partitioned table "public.hours_to_days"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-------------+-----------------------------+-----------+----------+------------------------------------------------+---------+--------------+-------------
day | date | | not null | | plain | |
time_of_day | time without time zone | | not null | | plain | |
full_date | timestamp without time zone | | | generated always as (day + time_of_day) stored | plain | |
Partition key: RANGE (day)
Indexes:
"hours_to_days_pkey" PRIMARY KEY, btree (day, time_of_day)
"idx_natural_time" btree (full_date)
Foreign-key constraints:
"hours_to_days_day_fkey" FOREIGN KEY (day) REFERENCES media_calendar(gregorian)
"hours_to_days_time_of_day_fkey" FOREIGN KEY (time_of_day) REFERENCES time_dim(time_of_day)
Partitions: hours_to_days_1990 FOR VALUES FROM ('1990-01-01') TO ('2000-01-01'),
hours_to_days_2000 FOR VALUES FROM ('2000-01-01') TO ('2010-01-01'),
hours_to_days_2010 FOR VALUES FROM ('2010-01-01') TO ('2020-01-01'),
hours_to_days_2020 FOR VALUES FROM ('2020-01-01') TO ('2030-01-01'),
hours_to_days_2030 FOR VALUES FROM ('2030-01-01') TO ('2040-01-01'),
hours_to_days_ancient FOR VALUES FROM (MINVALUE) TO ('1990-01-01'),
hours_to_days_sep FOR VALUES FROM ('2040-01-01') TO (MAXVALUE)
Ok, now we see a list of partitions. In the interest of shortening this article a bit, I won’t give the sub-partitioning example. However, trust me to say that if sub partitions existed, this method would not list them.
SELECT * FROM pg_partition_tree('hours_to_days');
relid | parentrelid | isleaf | level
-----------------------+---------------+--------+-------
hours_to_days | | f | 0
hours_to_days_ancient | hours_to_days | t | 1
hours_to_days_sep | hours_to_days | t | 1
hours_to_days_1990 | hours_to_days | t | 1
hours_to_days_2000 | hours_to_days | t | 1
hours_to_days_2010 | hours_to_days | t | 1
hours_to_days_2020 | hours_to_days | t | 1
hours_to_days_2030 | hours_to_days | t | 1
Here we would see any sub partitions and the partition levels. Here we have “level” to identify the node priority, including “0” which is the root node, and “parentrelid” to show node ownership. With that basic information, we can easily build a relationship tree.
We also have another, even simpler way to get to the root node.
SELECT * FROM pg_partition_root('hours_to_days_sep');
pg_partition_root
-------------------
hours_to_days
(1 row)
As well as the other way around. This shows the inheritance tree from any branch backwards toward the root.
SELECT * FROM pg_partition_ancestors('hours_to_days_sep');
relid
-------------------
hours_to_days_sep
hours_to_days
(2 rows)
And if we are using psql for a client, we have a new internal command to show partitions and indexes.
\dP
List of partitioned relations
Schema | Name | Owner | Type | Table
--------+---------------------+---------+-------------------+----------------
public | hours_to_days | kroybal | partitioned table |
public | media_calendar | kroybal | partitioned table |
public | hours_to_days_pkey | kroybal | partitioned index | hours_to_days
public | idx_natural_time | kroybal | partitioned index | hours_to_days
public | media_calendar_pkey | kroybal | partitioned index | media_calendar
(5 rows)
Following in the footsteps of many other commands,
ALTER TABLE ... ATTACH PARTITION
has eliminated the need for an EXCLUSIVE lock. This means that you can create new partitions, and add them to the partition set at run time, without using a maintenance window. Unfortunately, the reverse is not true. ALTER TABLE … DETACH PARTITION is still EXCLUSIVE lock dependent, so on-the-fly detachment still needs a lock, if only very briefly.
Several more improvements have been made, that really require no extended explanation:
1. The COPY command has reduced a bit of overhead, allowing for faster loading.
2. The tablespace specification for a parent is now inherited by the child.
3. pg_catalog.pg_indexes now shows indexes on partitioned children.
And that wraps it up for the new enhancements. Stay tuned for more articles about other features that will appear in PostgreSQL 12.