Configurable LZ4 TOAST compression

May 04, 2021

This article introduces configurable lz4 TOAST compression feature which will be available in PostgreSQL version 14, and demonstrates its usage.

  1. Background
  2. Configurable compression method
  3. Future improvement
  4. Demonstration
  5. PostgreSQL commit

 

Background

TOAST is a mechanism PostgreSQL uses to keep physical data rows from exceeding the size of a data block (typically 8KB). PostgreSQL does not support physical rows that cross block boundaries, so the block size is a hard upper limit on row size. To allow user tables to have rows wider than this, the TOAST mechanism breaks up wide field values into smaller pieces, which are stored "out of line" in a TOAST table associated with the user table.

Each table you create has its own associated (unique) TOAST table, which may or may not ever end up being used, depending on the size of rows you insert. All of this is transparent to the user, and enabled by default.

When a row that is to be stored is "too wide" (the threshold for that is 2KB by default), the TOAST mechanism first attempts to compress any wide field values. If that isn't enough to get the row under 2KB, it breaks up the wide field values into chunks that get stored in the associated TOAST table. Each original field value is replaced by a small pointer that shows where to find this "out of line" data in the TOAST table. TOAST will attempt to squeeze the user-table row down to 2KB in this way, but as long as it can get below 8KB, that's good enough and the row can be stored successfully.

 

Configurable compression method

Currently, there is only one algorithm to compress the data in PostgreSQL that is pglz. Which is a very old homegrown algorithm. There is now a per-column COMPRESSION option which can be set to either pglz (which is default) or lz4. A new postgresql.conf configuration parameter, default_toast_compression, is introduced to set the default compression method with default value being pglz. The value of this configuration parameter is used for new table columns when no compression method is specified. We don't have lz4 support in the core PostgreSQL, so to use lz4 compression, PostgreSQL must be built --with-lz4.

 

Future improvements

In future we can support more compression methods as built-in and also we can enhance the infrastructure to support user defined custom compression methods.

 

Demonstration


Provide compression method while creating a table

postgres[42821]=# CREATE TABLE cmdata(f1 text COMPRESSION lz4);
CREATE TABLE

postgres[42821]=# \d+ cmdata
                                        Table "public.cmdata"
 Column | Type | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
--------+------+-----------+----------+---------+----------+-------------+--------------+-------------
 f1     | text |           |          |         | extended | lz4         |              | 


postgres[42821]=# INSERT INTO cmdata VALUES(repeat('1234567890', 1000));
INSERT 0 1

Check the compression method of the compressed data.

postgres[42821]=# SELECT pg_column_compression(f1) FROM cmdata;
 pg_column_compression 
-----------------------
 lz4
(1 row)


Alter compression method of the existing column
The existing data will not be rewritten but any new data will be compressed with the new compression method

postgres[42821]=# ALTER TABLE cmdata ALTER COLUMN f1 SET COMPRESSION pglz;
ALTER TABLE
postgres[42821]=# \d+ cmdata
                                        Table "public.cmdata"
 Column | Type | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
--------+------+-----------+----------+---------+----------+-------------+--------------+-------------
 f1     | text |           |          |         | extended | pglz        |              | 

postgres[42821]=# SELECT pg_column_compression(f1) FROM cmdata;
 pg_column_compression 
-----------------------
 lz4
(1 row)

Insert new data and check the compression method of the data

postgres[42821]=# INSERT INTO cmdata VALUES (repeat('123456789', 4004));
INSERT 0 1

postgres[42821]=# SELECT pg_column_compression(f1) FROM cmdata;
 pg_column_compression 
-----------------------
 lz4
 pglz
(2 rows)

VACUUM FULL/CLUSTER can be used for re-compressing all the existing data with the current compression method of the column

postgres[42821]=# VACUUM FULL cmdata;
VACUUM
postgres[42821]=# SELECT pg_column_compression(f1) FROM cmdata;
 pg_column_compression 
-----------------------
 pglz
 pglz
(2 rows)


default_toast_compression,  postgresql.conf configuration parameter for setting default toast compression method

postgres[42821]=# SHOW default_toast_compression ;
 default_toast_compression 
---------------------------
 pglz
(1 row)

postgres[42821]=# CREATE TABLE cmdata1(f1 text);
CREATE TABLE
postgres[42821]=# \d+ cmdata1
                                        Table "public.cmdata1"
 Column | Type | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
--------+------+-----------+----------+---------+----------+-------------+--------------+-------------
 f1     | text |           |          |         | extended | pglz        |              | 

postgres[42821]=# SET default_toast_compression TO lz4;
SET
postgres[42821]=# CREATE TABLE cmdata2(f1 text);
CREATE TABLE
postgres[42821]=# \d+ cmdata2
                                        Table "public.cmdata2"
 Column | Type | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
--------+------+-----------+----------+---------+----------+-------------+--------------+-------------
 f1     | text |           |          |         | extended | lz4         |              | 



 

PostgreSQL commit

This feature has been committed to PostgreSQL and will be available in PostgreSQL version 14.

Share this

More Blogs