Release notes for EDB Postgres Distributed version 4.1.0 v4

Released: 17 May 2022

EDB Postgres Distributed version 4.1.0 includes the following:

CLI1.0.0FeatureAbility to gather information such as the current state of replication, consensus, and nodes for an EDB Postgres Distributed cluster using new command-line interface (CLI).
BDR4.1.0FeatureSupport in-place major upgrades of Postgres on a data node with a new command-line utility, bdr_pg_upgrade. This utility uses the standard pg_upgrade command, and reduces the time and network bandwidth needed to do major version upgrades of a EDB Postgres Distributed cluster.
BDR4.1.0FeatureEnable the ability to configure a replication lag threshold. After the threshold is met, the transaction commits get throttled. This threshold allows limiting RPO without incurring the latency impact on every transaction that comes with synchronous replication.
BDR4.1.0FeatureGlobal sequences are automatically configured based on data type replacing the need to set up custom sequence handling configuration on every node. The new SnowflakeID algorithm replaces Timeshard, which had limitations.
BDR4.1.0FeatureAdd a new SQL-level interface for configuring synchronous replication durability and visibility options by group rather than by node. This approach allows you to configure all nodes consistently from a single place instead of using config files.
BDR4.1.0FeatureAdd a new synchronous replication option, Group Commit, which allows a quorum to be required before committing a transaction in a EDB Postgres Distributed group.
BDR4.1.0FeatureAllow a Raft request to be required for CAMO switching to Local Mode. Add a require_raft flag to the CAMO pairing configuration which controls the behavior of switching from CAMO protected to Local Mode, introducing the option to require a majority of nodes to be connected to allow to switch to Local Mode. (RT78928)
BDR4.1.0FeatureAllow replication to continue on ALTER TABLE ... DETACH PARTITION CONCURRENTLY of already detached partition. Similarly to how BDR 4 handles CREATE INDEX CONCURRENTLY when same index already exists, we now allow replication to continue when ALTER TABLE ... DETACH PARTITION CONCURRENTLY is receiver for partition that has been already detached. (RT78362)
BDR4.1.0FeatureAdd additional filtering options to DDL filters. DDL filters allow for replication of different DDL statements to different replication sets. Similar to how table membership in replication set allows DML on different tables to be replicated via different replication sets. This release adds new controls that make it easier to use the DDL filters:
- query_match - if defined query must match this regex
- exclusive - if true, other matched filters are not taken into consideration (i.e. only the exclusive filter is applied), when multiple exclusive filters match, we throw error
BDR4.1.0FeatureAdd bdr.lock_table_locking configuration variable. When enabled this changes behavior of LOCK TABLE command to take take a global DML lock
BDR4.1.0FeatureImplement buffered write for LCR segment file. This should reduce I/O and improve CPU usage of the Decoding Worker.
BDR4.1.0FeatureAdd support for partial unique index lookups for conflict detection. Indexes on expression are however still not supported for conflict detection. (RT78368)
BDR4.1.0FeatureAdd additional statistics to bdr.stat_subscription:
- nstream_insert => the count of INSERTs on streamed transactions
- nstream_update => the count of UPDATEs on streamed transactions
- nstream_delete => the count of DELETEs on streamed transactions
- nstream_truncate => the count of TRUNCATEs on streamed transactions
- npre_commit_confirmations => the count pre-commit confirmations, when using CAMO
- npre_commit => the count of pre-commits
- ncommit_prepared => the count of prepared commits with 2PC
- nabort_prepared => the count of aborts of prepared transactions with 2PC
BDR4.1.0FeatureAdd execute_locally option to bdr.replicate_ddl_command.This allows optional queueing of ddl commands for replication to other groups without executing it locally. (RT73533)
BDR4.1.0FeatureAdd fast argument to bdr.alter_subscription_disable(). The argument only influences the behavior of immediate. When set to true (default) it will stop the workers without letting them finish the current work. (RT79798)
BDR4.1.0FeatureSimplify bdr.{add,remove}_camo_pair functions to return void.
BDR4.1.0FeatureAdd connectivity/lag check before taking global lock so that application or user does not have to wait for minutes to get lock timeout when there are obvious connectivity issues. Can be set to DEBUG, LOG, WARNING (default) or ERROR.
BDR4.1.0FeatureOnly log conflicts to conflict log table by default. They are no longer logged to the server log file by default, but this can be overridden.
BDR4.1.0FeatureImprove reporting of remote errors during node join.
BDR4.1.0FeatureMake autopartition worker's max naptime configurable.
BDR4.1.0FeatureAdd ability to request partitions upto the given upper bound with autopartition.
BDR4.1.0FeatureDon't try replicate DDL run on subscribe-only node. It has nowhere to replicate so any attempt to do so will fail. This is same as how logical standbys behave.
BDR4.1.0FeatureAdd bdr.accept_connections configuration variable. When false, walsender connections to replication slots using BDR output plugin will fail. This is useful primarily during restore of single node from backup.
BDR4.1.0Bug fixKeep the lock_timeout as configured on non-CAMO-partner BDR nodes. A CAMO partner uses a low lock_timeout when applying transactions from its origin node. This was inadvertently done for all BDR nodes rather than just the CAMO partner, which may have led to spurious lock_timeout errors on pglogical writer processes on normal BDR nodes.
BDR4.1.0Bug fixShow a proper wait event for CAMO / Eager confirmation waits. Show correct "BDR Prepare Phase"/"BDR Commit Phase" in bdr.stat_activity instead of the default “unknown wait event”. (RT75900)
BDR4.1.0Bug fixReduce log for bdr.run_on_nodes. Don't log when setting bdr.ddl_replication to off if it's done with the "run_on_nodes" variants of function. This eliminates the flood of logs for monitoring functions. (RT80973)
BDR4.1.0Bug fixFix replication of arrays of composite types and arrays of builtin types that don't support binary network encoding
BDR4.1.0Bug fixFix replication of data types created during bootstrap
BDR4.1.0Bug fixConfirm end LSN of the running transactions record processed by WAL decoder so that the WAL decoder slot remains up to date and WAL sender get the candidate in timely manner.
BDR4.1.0Bug fixDon't wait for autopartition tasks to complete on parting nodes
BDR4.1.0Bug fixLimit the bdr.standby_slot_names check when reporting flush position only to physical slots. Otherwise flush progress is not reported in presence of disconnected nodes when using bdr.standby_slot_names. (RT77985, RT78290)
BDR4.1.0Bug fixRequest feedback reply from walsender if we are close to wal_receiver_timeout
BDR4.1.0Bug fixDon't record dependency of auto-paritioned table on BDR extension more than once. This resulted in "ERROR: unexpected number of extension dependency records" errors from auto-partition and broken replication on conflicts when this happens.

Note that existing broken tables need to still be fixed manually by removing the double dependency from pg_depend

BDR4.1.0Bug fixImprove keepalive handling in receiver. Don't update position based on keepalive when in middle of streaming transaction as we might lose data on crash if we do that. There is also new flush and signalling logic that should improve latency in low TPS scenarios.
BDR4.1.0Bug fixOnly do post CREATE commands processing when BDR node exists in the database.
BDR4.1.0Bug fixDon't try to log ERROR conflicts to conflict history table.
BDR4.1.0Bug fixFixed segfault where a conflict_slot was being used after it was released during multi-insert (COPY) (RT76439).
BDR4.1.0Bug fixPrevent walsender processes spinning when facing lagging standby slots. Correct signaling to reset a latch so that a walsender process does consume 100% of a CPU in case one of the standby slots is lagging behind. (RT80295, RT78290)
BDR4.1.0Bug fixFix handling of wal_sender_timeout when bdr.standby_slot_names are used (RT78290)
BDR4.1.0Bug fixFix reporting of disconnected slots in bdr.monitor_local_replslots. They could have been previously reported as missing instead of disconnected.
BDR4.1.0Bug fixFix apply timestamp reporting for down subscriptions in bdr.get_subscription_progress() function and in the bdr.subscription_summary that uses that function. It would report garbage value before.
BDR4.1.0Bug fixFix snapshot handling in various places in BDR workers.
BDR4.1.0Bug fixBe more consistent about reporting timestamps and LSNs as NULLs in monitoring functions when there is no available value for those.
BDR4.1.0Bug fixReduce log information when switching between writer processes.
BDR4.1.0Bug fixDon't do superuser check when configuration parameter was specified on PG command-line. We can't do transactions there yet and it's guaranteed to be superuser changed at that stage.
BDR4.1.0Bug fixUse 64 bits for calculating lag size in bytes. To eliminate risk of overflow with large lag.
HARP2.1.0FeatureThe BDR DCS now uses a push notification from the consensus rather than through polling nodes. This change reduces the time for new leader selection and the load that HARP does on the BDR DCS since it doesn't need to poll in short intervals anymore.
HARP2.1.0FeatureTPA now restarts each HARP Proxy one by one and wait until they come back to reduce any downtime incurred by the application during software upgrades.
HARP2.1.0FeatureThe support for embedding PGBouncer directly into HARP Proxy is now deprecated and will be removed in the next major release of HARP. It's now possible to configure TPA to put PGBouncer on the same node as HARP Proxy and point to that HARP Proxy.
HARP2.1.0Bug fixharpctl promote <node_name> would occasionally promote a different node than the one specified. This has been fixed. (RT75406)
HARP2.1.0Bug fixFencing would sometimes fail when using BDR as the Distributed Consensus Service. This has been corrected.
HARP2.1.0Bug fixharpctl apply no longer turns off routing for leader after the cluster has been established. (RT80790)
HARP2.1.0Bug fixHarp-manager no longer exits if it cannot start a failed database. Harp-manager will keep retrying with randomly increasing periods. (RT78516)
HARP2.1.0Bug fixThe internal pgbouncer proxy implementation had a memory leak. This has been remediated.