October 7, 2020
Logical replication in PostgreSQL is a publish-subscribe model, where a subscriber node can subscribe to a publisher and receive logical changes from the publisher in real time. When a subscription is created, a walsender process is spawned in the publisher node. The walsender process is responsible for reading the WAL one-by-one and reassembles the changes belonging to each transaction. The walsender process stores the changes in memory and flushes them to the local files if the changes exceed the memory limit. On transaction commit, it decodes the corresponding changes and sends them to the subscribers where they are applied incrementally. The transfer of this decoded WAL is limited by the network bandwidth and so the subscriber can be delayed in applying the WAL and catching up with the publisher. This delay at the apply side is called apply-lag.
Processing large transactions often results in significant apply-lag because as described above, only at the commit time, all the changes are sent to the downstream at the same time. For large transactions the time to do the network transfer may be significant and that will cause the apply-lag.
Logical replication of the in-progress transactions
In PostgreSQL 14, we will allow the streaming of in-progress transactions instead of spilling them to local files. On the apply side, we simply receive the streamed changes and write them into the files. At commit time, the changes are replayed locally, without having to wait for a large chunk of data to be relayed over the network. There are two major advantages of doing this:
- No need to send the large data over the network at the commit time.
- If there are some major changes happening on the unrelated tables (non-published table) then those changes can be discarded early instead of spilling them to the local files.
In PostgreSQL 13, a new configuration parameter was added, logical_decoding_work_mem which controls the amount of memory allocated to a walsender process to store the changes in memory before spilling the same to disk. The same parameter regulates when the changes are sent downstream. Basically, whenever the in-memory changes in the walsender process crosses this size, we select the largest transaction based on its in-memory changes size and stream all its changes. This process is repeated until we come under the memory limit.
How to decode in-progress transactions
For decoding the changes of the running transaction using the test_decoding plugin, use a new option 'stream-changes' in pg_logical_slot_get_changes.
logical_decoding_work_mem=”64kB”; postgres=# CREATE TABLE stream_test(data text); CREATE TABLE postgres=# BEGIN; BEGIN postgres=#* INSERT INTO stream_test SELECT repeat('a', 2000) || g.i FROM generate_series(1, 35) g(i); INSERT 0 35 postgres=#* SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'stream-changes', '1'); data -------------------------------------------------- BEGIN 553 COMMIT 553 opening a streamed block for transaction TXN 554 streaming change for TXN 554 streaming change for TXN 554 streaming change for TXN 554 streaming change for TXN 554 streaming change for TXN 554 streaming change for TXN 554 ….. closing a streamed block for transaction TXN 554 (35 rows)
For enabling the streaming of the in-progress transactions in logical replication setup, we need to pass a new option (streaming=on) while creating the subscription e.g.
CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub WITH (streaming = on)
[publisher configurations]: wal_level = logical synchronous_standby_names = '*' [this is set so that commit will wait for response from downstream] logical_decoding_work_mem=10MB [publisher setup]: CREATE TABLE publish_table(a int ,b text); CREATE TABLE non_publish_table(a int ,b text); CREATE PUBLICATION test_pub FOR TABLE publish_table; [subscription setup] CREATE TABLE publish_table(a int ,b text); CREATE SUBSCRIPTION test_sub CONNECTION 'host=127.0.0.1 port=5432 dbname=postgres' PUBLICATION test_pub; [performance test]: Execute a large transaction on the publisher and measure the total commit time (local commit + subscriber apply + ack from the subscriber) The transaction operates on published as well as non-published tables BEGIN; INSERT INTO non_publish_table SELECT i, REPEAT('x', 10) FROM generate_series(1,5000000) AS i; INSERT INTO publish_table SELECT i, REPEAT('x', 10) FROM generate_series(1,1000000) AS i; COMMIT; Time taken for COMMIT: 5,7087.004 ms Enable the streaming and recheck the performance. ALTER SUBSCRIPTION test_sub SET(STREAMING = ON) Time taken for COMMIT: 2,3967.789 ms
We can notice a significant reduction in the commit time if the streaming of in-progress transactions is enabled. This is mainly due to:
- Unrelated changes are not spilled to the disk which reduces the I/O cost.
- Instead of waiting for the transaction to commit, we continuously send updates to the receiver while the transaction is being processed on the sender.
Further scope of improvement
Currently on the apply side, we are flushing out the changes of the in-progress transactions into the subscriber side files and only at commit time we are reading those changes and applying them. The reason for this is that the changes for the in-progress transactions can be interleaved, and on the subscriber side, we need to apply all the changes for a transaction under a single subscriber side transaction. So, for further reducing the apply lag, we can start subscription side workers and apply the changes immediately instead of flushing them to the files. But there are still a number of problems to be solved, so it’s unclear when this feature will be available.
This feature has been committed to PostgreSQL and will be available in PostgreSQL version 14.
Dilip works as Associate Database Architect at EDB. He is currently working on database internals and associated with the team developing undo based database storage engine for PostgreSQL open source. Dilip has been working on PostgreSQL code since 2010 and he is a PostgreSQL contributor. The major work done by him is parallel-query and performance improvements for multi-core machines. In the past, he has worked on various database projects including integrating in-memory storage engine with PostgreSQL.