Known issues v4

This section discusses currently known issues in EDB Postgres Distributed 4.

Data Consistency

Read about Conflicts to understand the implications of the asynchronous operation mode in terms of data consistency.

List of issues

These known issues are tracked in BDR's ticketing system and are expected to be resolved in a future release.

  • Performance of HARP in terms of failover and switchover time depends non-linearly on the latencies between DCS nodes. Which is why we currently recommend using etcd cluster per region for HARP in case of EDB Postgres Distributed deployment over multiple regions (typically the Gold and Platinum layouts). TPAexec already sets up the etcd do run per region cluster for these when harp_consensus_protocol option is set to etcd in the config.yml.

    It's recommended to increase the leader_lease_duration HARP option (harp_leader_lease_duration in TPAexec) for DCS deployments across higher latency network.

  • If the resolver for the update_origin_change conflict is set to skip, synchronous_commit=remote_apply is used, and concurrent updates of the same row are repeatedly applied on two different nodes, then one of the update statements might hang due to a deadlock with the BDR writer. As mentioned in the Conflicts chapter, skip is not the default resolver for the update_origin_change conflict, and this combination isn't intended to be used in production. It discards one of the two conflicting updates based on the order of arrival on that node, which is likely to cause a divergent cluster. In the rare situation that you do choose to use the skip conflict resolver, note the issue with the use of the remote_apply mode.

  • The Decoding Worker feature doesn't work with CAMO/EAGER/Group Commit. Installations using CAMO/Eager/Group Commit must keep enable_wal_decoder disabled.

  • Decoding Worker works only with the default replication sets.

  • Lag control doesn't adjust commit delay in any way on a fully isolated node, that is, in case all other nodes are unreachable or not operational. As soon as at least one node is connected, replication lag control picks up its work and adjusts the BDR commit delay again.

  • For time-based lag control, BDR currently uses the lag time (measured by commit timestamps) rather than the estimated catchup time that's based on historic apply rate.

  • Changing the CAMO partners in a CAMO pair isn't currently possible. It's possible only to add or remove a pair. Adding or removing a pair doesn't need a restart of Postgres or even a reload of the configuration.

  • Group Commit cannot be combined with CAMO or Eager All Node replication. Eager Replication currently only works by using the "global" BDR commit scope.

  • Neither Eager replication nor Group Commit support synchronous_replication_availability = 'async'.

  • Group Commit doesn't support a timeout of the commit after bdr.global_commit_timeout.

  • Transactions using Eager Replication can't yet execute DDL, nor do they support explicit two-phase commit. The TRUNCATE command is allowed.

  • Not all DDL can be run when either CAMO or Group Commit is used.

  • Parallel apply is not currently supported in combination with Group Commit, please make sure to disable it when using Group Commit by either setting num_writers to 1 for the node group (using bdr.alter_node_group_config) or via the GUC bdr.writers_per_subscription (see Configuration of Generic Replication).

  • There currently is no protection against altering or removing a commit scope. Running transactions in a commit scope that is concurrently being altered or removed can lead to the transaction blocking or replication stalling completely due to an error on the downstream node attempting to apply the transaction. Ensure that any transactions using a specific commit scope have finished before altering or removing it.

List of limitations

This is a (non-comprehensive) list of limitations that are expected and are by design. They are not expected to be resolved in the future.

  • Replacing a node with its physical standby doesn't work for nodes that use CAMO/Eager/Group Commit. Combining physical standbys and BDR in general isn't recommended, even if otherwise possible.

  • A galloc sequence might skip some chunks if the sequence is created in a rolled back transaction and then created again with the same name. This can also occur if it is created and dropped when DDL replication isn't active and then it is created again when DDL replication is active. The impact of the problem is mild, because the sequence guarantees aren't violated. The sequence skips only some initial chunks. Also, as a workaround you can specify the starting value for the sequence as an argument to the bdr.alter_sequence_set_kind() function.

  • Legacy BDR synchronous replication uses a mechanism for transaction confirmation different from the one used by CAMO, Eager, and Group Commit. The two are not compatible and must not be used together. Therefore, nodes that appear in synchronous_standby_names must not be part of CAMO, Eager, or Group Commit configuration. Using synchronous replication to other nodes, including both logical and physical standby is possible.