Release notes for EDB Postgres Distributed version 4.0.1 v4

This is a maintenance release for BDR 4.0 and HARP 2.0 which includes minor Enhancements as well as fixes for issues identified in previous versions.

BDR4.0.1EnhancementReduce frequency of CAMO partner connection attempts.

In case of a failure to connect to a CAMO partner to verify its configuration and check the status of transactions, do not retry immediately (leading to a fully busy pglogical manager process), but throttle down repeated attempts to reconnect and checks to once per minute.

BDR4.0.1EnhancementImplement buffered read for LCR segment file (BDR-1422)

Implement LCR segment file buffering so that multiple LCR chunks can be read at a time. This should reduce I/O and improve CPU usage of Wal Senders when using the Decoding Worker.

BDR4.0.1EnhancementAvoid unnecessary LCR segment reads (BDR-1426)

BDR now attempts to only read new LCR segments when there is at least one available. This reduces I/O load when Decoding Worker is enabled.

BDR4.0.1EnhancementPerformance of COPY replication including the initial COPY during join has been greatly improved for partitioned tables (BDR-1479)

For large tables this can improve the load times by order of magnitude or more.

BDR4.0.1Bug fixFix the parallel apply worker selection (BDR-1761)

This makes parallel apply work again. In 4.0.0 parallel apply was never in effect due to this bug.

BDR4.0.1Bug fixFix Raft snapshot handling of bdr.camo_pairs (BDR-1753)

The previous release would not correctly propagate changes to the CAMO pair configuration when they were received via Raft snapshot.

BDR4.0.1Bug fixCorrectly handle Raft snapshots from BDR 3.7 after upgrades (BDR-1754)
BDR4.0.1Bug fixUpgrading a CAMO configured cluster taking into account the bdr.camo_pairs in the snapshot while still excluding the ability to perform in place upgrade of a cluster (due to upgrade limitations unrelated to CAMO).
BDR4.0.1Bug fixSwitch from CAMO to Local Mode only after timeouts (RT74892)

Do not use the catchup_interval estimate when switching from CAMO protected to Local Mode, as that could induce inadvertent switching due to load spikes. Use the estimate only when switching from Local Mode back to CAMO protected (to prevent toggling forth and back due to lag on the CAMO partner).

BDR4.0.1Bug fixFix replication set cache invalidation when published replication set list have changed (BDR-1715)

In previous versions we could use stale information about which replication sets (and as a result which tables) should be published until the subscription has reconnected.

BDR4.0.1Bug fixPrevent duplicate values generated locally by galloc sequence in high concurrency situations when the new chunk is used (RT76528)

The galloc sequence could have temporarily produce duplicate value when switching which chunk is used locally (but not across nodes) if there were multiple sessions waiting for the new value. This is now fixed.

BDR4.0.1Bug fixAddress memory leak on streaming transactions (BDR-1479)

For large transaction this reduces memory usage and I/O considerably when using the streaming transactions feature. This primarily improves performance of COPY replication.

BDR4.0.1Bug fixDon't leave slot behind after PART_CATCHUP phase of node parting when the catchup source has changed while the node was parting (BDR-1716)

When node is being removed (parted) from BDR group, we do so called catchup in order to forward any missing changes from that node between remaining nodes in order to keep the data on all nodes consistent. This requires an additional replication slot to be created temporarily. Normally this replication slot is removed at the end of the catchup phase, however in certain scenarios where we have to change the source node for the changes, this slot could have previously been left behind. From this version, this slot is always correctly removed.

BDR4.0.1Bug fixEnsure that the group slot is moved forward when there is only one node in the BDR group

This prevents disk exhaustion due to WAL accumulation when the group is left running with just single BDR node for a prolonged period of time. This is not recommended setup but the WAL accumulation was not intentional.

BDR4.0.1Bug fixAdvance Raft protocol version when there is only one node in the BDR group

Single node clusters would otherwise always stay on oldest support protocol until another node was added. This could limit available feature set on that single node.

HARP2.0.1EnhancementSupport for selecting a leader per location rather than relying on DCS like etcd to have separate setup in different locations. This still requires a majority of nodes to survive loss of a location, so an odd number of both locations and database nodes is recommended.
HARP2.0.1EnhancementThe BDR DCS now uses a push notification from the consensus rather than through polling nodes. This change reduces the time for new leader selection and the load that HARP does on the BDR DCS since it doesn't need to poll in short intervals anymore.
HARP2.0.1EnhancementTPA now restarts each HARP Proxy one by one and wait until they come back to reduce any downtime incurred by the application during software upgrades.
HARP2.0.1EnhancementThe support for embedding PGBouncer directly into HARP Proxy is now deprecated and will be removed in the next major release of HARP. It's now possible to configure TPA to put PGBouncer on the same node as HARP Proxy and point to that HARP Proxy.
HARP2.0.1Bug fixharpctl promote <node_name> would occasionally promote a different node than the one specified. This has been fixed. [Support Ticket #75406]
HARP2.0.1Bug fixFencing would sometimes fail when using BDR as the Distributed Consensus Service. This has been corrected.
HARP2.0.1Bug fixharpctl apply no longer turns off routing for leader after the cluster has been established. [Support Ticket #80790]
HARP2.0.1Bug fixHarp-manager no longer exits if it cannot start a failed database. Harp-manager will keep retrying with randomly increasing periods. [Support Ticket #78516]
HARP2.0.1Bug fixThe internal pgbouncer proxy implementation had a memory leak. This has been remediated.