Durability & Performance Options v3.7

Overview

Synchronous or Eager Replication synchronizes between at least two nodes of the cluster before committing a transaction. This provides three properties of interest to applications, which are related, but can all be implemented individually:

  • Durability: writing to multiple nodes increases crash resilience and allows the data to be recovered after a crash and restart.
  • Visibility: with the commit confirmation to the client, the database guarantees immediate visibility of the committed transaction on some sets of nodes.
  • No Conflicts After Commit: the client can rely on the transaction to eventually be applied on all nodes without further conflicts, or get an abort directly informing the client of an error.

BDR integrates with the synchronous_commit option of Postgres itself, providing a variant of synchronous replication, which can be used between BDR nodes. BDR also offers two additional replication modes:

  • Commit At Most Once (CAMO). This feature solves the problem with knowing whether your transaction has COMMITed (and replicated) or not in case of certain errors during COMMIT. Normally, it might be hard to know whether or not the COMMIT was processed in. With this feature, your application can find out what happened, even if your new database connection is to node than your previous connection. For more info about this feature see the Commit At Most Once chapter.
  • Eager Replication. This is an optional feature to avoid replication conflicts. Every transaction is applied on all nodes simultaneously, and commits only if no replication conflicts are detected. This feature does reduce performance, but provides very strong consistency guarantees. For more info about this feature see the Eager All-Node Replication chapter.

Postgres itself provides Physical Streaming Replication (PSR), which is uni-directional, but offers a synchronous variant that can used in combination with BDR.

WARNING

This only works when using a single database per node. When using multiple BDR enabled databases per node, which is not generally recommended, the LSN based confirmations may originate from any one of the databases from a node specified in synchronous_standby_names and thus not assure the data is really flushed to disk.

This chapter covers the various forms of synchronous or eager replication and its timing aspects.

Comparison

Most options for synchronous replication available to BDR allow for different levels of synchronization, offering different trade-offs between performance and protection against node or network outages.

The following table summarizes what a client can expect from a peer node replicated to after having received a COMMIT confirmation from the origin node the transaction was issued to.

VariantModeReceivedVisibleDurable
PGL/BDRoff (default)nonono
PGL/BDRremote_write (2)yesnono
PGL/BDRon (2)yesyesyes
PGL/BDRremote_apply (2)yesyesyes
PSRremote_write (2)yesnono (1)
PSRon (2)yesnoyes
PSRremote_apply (2)yesyesyes
CAMOremote_write (2)yesnono
CAMOremote_commit_async (2)yesyesno
CAMOremote_commit_flush (2)yesyesyes
Eagern/ayesyesyes

(1) written to the OS, durable if the OS remains running and only Postgres crashes.

(2) unless switched to Local mode (if allowed) by setting synchronous_replication_availability to async', otherwise the values for the asynchronous BDR default apply.

Reception ensures the peer will be able to eventually apply all changes of the transaction without requiring any further communication, i.e. even in the face of a full or partial network outage. All modes considered synchronous provide this protection.

Visibility implies the transaction was applied remotely, and any possible conflicts with concurrent transactions have been resolved. Without durability, i.e. prior to persisting the transaction, a crash of the peer node may revert this state (and require re-transmission and re-application of the changes).

Durability relates to the peer node's storage and provides protection against loss of data after a crash and recovery of the peer node. If the transaction has already been visible before the crash, it will be recovered to be visible, again. Otherwise, the transaction's payload is persisted and the peer node will be able to apply the transaction eventually (without requiring any re-transmission of data).

Internal Timing of Operations

For a better understanding of how the different modes work, it is helpful to realize PSR and PGLogical apply transactions rather differently.

With physical streaming replication, the order of operations is:

  • origin flushes a commit record to WAL, making the transaction visible locally
  • peer node receives changes and issues a write
  • peer flushes the received changes to disk
  • peer applies changes, making the transaction visible locally

With PGLogical, the order of operations is different:

  • origin flushes a commit record to WAL, making the transaction visible locally
  • peer node receives changes into its apply queue in memory
  • peer applies changes, making the transaction visible locally
  • peer persists the transaction by flushing to disk

For CAMO and Eager All Node Replication, note that the origin node waits for a confirmation prior to making the transaction visible locally. The order of operations is:

  • origin flushes a prepare or pre-commit record to WAL
  • peer node receives changes into its apply queue in memory
  • peer applies changes, making the transaction visible locally
  • peer persists the transaction by flushing to disk
  • origin commits and makes the transaction visible locally

The following table summarizes the differences.

VariantOrder of apply vs persist on peer nodesReplication before or after origin WAL commit record write
PSRpersist firstafter
PGLapply firstafter
CAMOapply firstbefore (triggered by pre-commit)
Eagerapply firstbefore (triggered by prepare)

Configuration

The following table provides an overview of which configuration settings are required to be set to a non-default value (req) or optional (opt), but affecting a specific variant.

setting (GUC)PSRPGLCAMOEager
synchronous_standby_namesreqreqn/an/a
synchronous_commitoptoptn/an/a
synchronous_replication_availabilityoptoptoptn/a
bdr.enable_camon/an/areqn/a
bdr.camo_origin_forn/an/areqn/a
bdr.camo_partner_of (on partner node)n/an/areqn/a
bdr.commit_scopen/an/an/areq
bdr.global_commit_timeoutn/an/aoptopt

Planned Shutdown and Restarts

When using PGL or CAMO in combination with remote_write, care must be taken with planned shutdown or restart. By default, the apply queue is consumed prior to shutting down. However, in the immediate shutdown mode, the queue is discarded at shutdown, leading to the stopped node "forgetting" transactions in the queue. A concurrent failure of another node could lead to loss of data, as if both nodes failed.

To ensure the apply queue gets flushed to disk, please use either smart or fast shutdown for maintenance tasks. This maintains the required synchronization level and prevents loss of data.

Synchronous Replication using PGLogical

Usage

To enable synchronous replication using PGLogical, the application name of the relevant BDR peer nodes need to be added to synchronous_standby_names. The use of FIRST x or ANY x offers a lot of flexibility, if this does not conflict with the requirements of non-BDR standby nodes.

Once added, the level of synchronization can be configured per transaction via synchronous_commit, which defaults to on - meaning that adding to synchronous_standby_names already enables synchronous replication. Setting synchronous_commit to local or off turns off synchronous replication.

Due to PGLogical applying the transaction before persisting it, the values on and remote_apply are equivalent (for logical replication).

Limitations

PGLogical uses the same configuration (and internal mechanisms) as Physical Streaming Replication, therefore the needs for (physical, non-BDR) standbys needs to be considered when configuring synchronous replication between BDR nodes using PGLogical. Most importantly, it is not possible to use different synchronization modes for a single transaction.