Commit At Most Once (CAMO) v3.7

The objective of the Commit at Most Once (CAMO) feature is to prevent the application from committing more than once.

Without CAMO, when a client loses connection after COMMIT has been submitted, the application might not receive a reply from the server and will therefore be unsure whether the transaction committed or not.

The application cannot easily decide between the two options of:

1) retrying the transaction with the same data, since this can in some cases cause the data to be entered twice, or

2) not retrying the transaction, and risk that the data doesn't get processed at all.

Either of those is a critical error with high value data.

There are two ways to avoid this situation:

One way to avoid this situation is to make sure that the transaction includes at least one INSERT into a table with a unique index, but that is dependent upon the application design and requires application- specific error-handling logic, so is not effective in all cases.

The CAMO feature in BDR offers a more general solution and does not require an INSERT as described above. When activated via bdr.enable_camo or bdr.commit_scope, the application will receive a message containing the transaction identifier, if already assigned. Otherwise, the first write statement in a transaction will send that information to the client. If the application sends an explicit COMMIT, the protocol will ensure that the application will have received the notification of the transaction identifier before the COMMIT is sent. If the server does not reply to the COMMIT, the application can handle this error by using the transaction identifier to request the final status of the transaction from another BDR node. If the prior transaction status is known, then the application can safely decide whether or not to retry the transaction.

CAMO works in one of two modes:

  • Pair mode
  • In combination with Eager All Node Replication

In the Pair mode, CAMO works by creating a pair of partner nodes that are two BDR master nodes from the same top level BDR group. In this operation mode, each node in the pair knows the outcome of any recent transaction executed on the other peer, and especially (for our need) knows the outcome of any transaction disconnected during COMMIT. We may refer as "origin" to the node that receives the transactions from the application and "partner" the node that confirms these transactions, but there is no difference in the CAMO configuration for the nodes in the CAMO pair. The pair is symmetric.

When combined with Eager All Node Replication, CAMO enables every peer (that is a full BDR master node) to act as a CAMO partner. No designated CAMO partner needs to be configured in this mode.

Warning

CAMO requires changes to the user's application to take advantage of the advanced error handling: it is not sufficient to enable a parameter to gain protection. Reference client implementations are provided in Appendix E.

Requirements

To utilize CAMO, an application must issue an explicit COMMIT message, issued as a separate request (not as part of a multi-statement request). CAMO cannot provide status for transactions issued from within procedures, or from single-statement transactions that use implicit commits.

Configuration

Assuming an existing BDR cluster consisting of the two nodes node1 and node2, both with a BDR enabled database called bdrdemo, and both part of the same node group mygroup the following steps will configure the nodes to be CAMO partners for each other.

1) Create the BDR cluster where nodes node1 and node2 are part of mygroup node group. 2) Run the function bdr.add_camo_pair() on one node:

SELECT bdr.add_camo_pair('mygroup', 'node1', 'node2');

3) Adjust the application to use the COMMIT error handling that CAMO suggests.

We do not recommend enabling CAMO at server level, as this imposes higher latency for all transactions, even when not needed. Instead, we recommend to selectively enable it just for individual transactions by turning on CAMO at session or transaction level.

To enable at session level, issue:

SET bdr.enable_camo = 'remote_commit_flush';

...or to enable for individual transactions, issue this after starting the transaction and before committing it:

SET LOCAL bdr.enable_camo = 'remote_commit_flush';

Valid values for bdr.enable_camo that enable CAMO are:

  • off (default)
  • remote_write
  • remote_commit_async
  • remote_commit_flush or on

See the Comparison of synchronous replication modes for details about how each mode behaves. Setting bdr.enable_camo = off disables this feature, which is the default.

CAMO with Eager All Node Replication

To use CAMO with Eager All Node Replication, no changes are required on either node. It is sufficient to enable the global commit scope after the start of the transaction - you do not need to set bdr.enable_camo at all.

BEGIN;
SET LOCAL bdr.commit_scope = 'global';
...
COMMIT;

The application still needs to be adjusted to use COMMIT error handling as specified, but is free to connect to any available BDR node to query the transaction's status.

Failure Scenarios

In this section, we analyze failure scenarios for different configurations. After comparing Local mode with CAMO mode in terms of Availability versus Consistency, we also provide three specific examples.

Data persistence at receiver side

By default, a PGL writer operates in bdr.synchronous_commit = off mode when applying transactions from remote nodes. This holds true for CAMO as well, meaning that transactions are confirmed to the origin node possibly before reaching the disk of the CAMO partner. In case of a crash or hardware failure, it is possible for a confirmed transaction to not be recoverable on the CAMO partner by itself. This is not an issue as long as the CAMO origin node remains operational, as it will redistribute the transaction once the CAMO partner node recovers.

This in turn means CAMO can protect against a single node failure, which is correct for Local mode as well as (or even in combination with) Remote Write.

To cover an outage of both nodes of a CAMO pair, it is possible to use bdr.synchronous_commit = local to enforce a flush prior to the pre-commit confirmation. This does not work in combination with either Remote Write nor Local mode, and has an additional performance impact due to additional I/O requirements on the CAMO partner in the latency sensitive commit path.

Local Mode

When synchronous_replication_availability = 'async', a node (i.e. master) will detect whether its CAMO partner is ready; if not, it will temporarily switch to Local mode. When in Local mode, a node commits transactions locally, until switching back to CAMO mode.

This clearly does not allow COMMIT status to be retrieved, but does provide the option to choose availability over consistency. This mode can tolerate a single node failure. In case both nodes of a CAMO pair fail, they may choose incongruent commit decisions to maintain availability, leading to data inconsistencies.

For a CAMO partner to switch to ready, it needs to be connected, and the estimated catchup interval needs to drop below bdr.global_commit_timeout. The current readiness status of a CAMO partner can be checked with bdr.is_camo_partner_ready, while bdr.node_replication_rates provides the current estimate of the catchup time.

The switch from CAMO protected to Local Mode is only ever triggered by an actual CAMO transaction. Either because the commit exceeds the bdr.global_commit_timeout or in case the CAMO partner is already known disconnected at the time of commit. This switch is independent of the estimated catchup interval. If the CAMO pair is configured to require Raft to switch to Local Mode, this switch will require a majority of nodes to be operational (see the require_raft flag for bdr.add_camo_pair). This can prevent a split brain situation due to an isolated node from switching to Local Mode. If require_raft is not set for the CAMO pair, the origin node will switch to Local Mode immediately.

The detection on the sending node can be configured via PostgreSQL settings controlling keep-alives and timeouts on the TCP connection to the CAMO partner. The wal_sender_timeout is the amount of time that a node waits for a CAMO partner until switching to Local mode. Additionally, the bdr.global_commit_timeout setting puts a per-transaction limit on the maximum delay a COMMIT can incur due to the CAMO partner being unreachable. It may well be lower than the wal_sender_timeout, which influences synchronous standbys as well, and for which a good compromise between responsiveness and stability needs to be found.

The switch from Local mode to CAMO mode depends on the CAMO partner node, which initiates the connection. The CAMO partner tries to re-connect at least every 30 seconds. After connectivity is reestablished, it may therefore take up to 30 seconds until the CAMO partner connects back to its origin node. Any lag that accumulated on the CAMO partner will further delay the switch back to CAMO protected mode.

Unlike during normal CAMO operation, in Local mode there is no additional commit overhead. This can be problematic, as it allows the node to continuously process more transactions than the CAMO pair could normally process. Even if the CAMO partner eventually reconnects and applies transactions, its lag will only ever increase in such a situation, preventing re-establishing the CAMO protection. To artificially throttle transactional throughput, BDR provides the bdr.camo_local_mode_delay setting, allowing to delay COMMITs in Local mode by an arbitrary amount of time. We recommend to measure commit times in normal CAMO mode during expected workloads and configure this delay accordingly. The default is 5 ms, which reflects a local network and a relatively quick CAMO partner response.

The choice of whether to allow Local mode should be taken in view of the architecture and the availability requirements. We expand this point by discussing three specific examples in some detail.

Example: Symmetric Node Pair

In this section we consider a setup with two BDR nodes that are the CAMO partner of each other. This is the only possible configuration starting with BDR4.

This configuration enables CAMO behavior on both nodes; it is therefore suitable for workload patterns where it is acceptable to write concurrently on more than one node, e.g. in cases that are not likely to generate conflicts.

With Local Mode

If Local mode is allowed, there is no single point of failure, and when one node fails:

  • The other node can determine the status of all transactions that were disconnected during COMMIT on the failed node.
  • New write transactions are allowed:
    • If the second node also fails, then the outcome of those transactions that were being committed at that time will be unknown.

Without Local Mode

If Local mode is not allowed, then each node requires the other node for committing transactions, i.e. each node is a single point of failure. Precisely, when one node fails:

  • The other node can determine the status of all transactions that were disconnected during COMMIT on the failed node.
  • New write transactions will be prevented until the node recovers.

Application Usage

Overview and Requirements

Commit At Most Once relies on a retry loop and specific error handling on the client side. There are three aspects to it:

  • The result of a transaction's COMMIT needs to be checked, and in case of a temporary error, the client must retry the transaction.
  • Prior to COMMIT, the client needs to retrieve a global identifier for the transaction, consisting of a node id and a transaction id (both 32 bit integers).
  • Should the current server fail while attempting COMMIT of a transaction, the application must connect to its CAMO partner, retrieve the status of that transaction, and retry depending on the response.

Note that the application needs to store the global transaction identifier only for the purpose of verifying the transaction status in case of disconnection during COMMIT. In particular, the application does not need any additional persistence layer: if the application fails, it only needs the information in the database to restart.

Adding a CAMO pair

The function bdr.add_camo_pair() configures an existing pair of BDR nodes to work as a symmetric CAMO pair.

The require_raft option controls how and when to switch to Local Mode in case synchronous_replication_availability is set to async, allowing such a switch in general.

Synopsis

bdr.add_camo_pair(node_group text, left_node text, right_node text,
                  require_raft bool)
Note

The names left and right have no special meaning.

Note

Since BDR version 4.0, only symmetric CAMO configurations are supported, i.e. both nodes of the pair act as a CAMO partner for each other.

Changing the configuration of a CAMO pair

The function bdr.alter_camo_pair() allows to toggle the require_raft flag. Note that it is not currently possible to change the nodes of a pairing, bdr.remove_camo_pair followed by bdr.add_camo_pair must be used, instead.

Synopsis

bdr.alter_camo_pair(node_group text, left_node text, right_node text,
                    require_raft bool)

Removing a CAMO pair

The function bdr.remove_camo_pair() removes a CAMO pairing of two nodes and disallows future use of CAMO transactions via bdr.enable_camo on those two nodes.

Synopsis

bdr.remove_camo_pair(node_group text, left_node text, right_node text)
Note

The names left and right have no special meaning.

CAMO partner connection status

The function bdr.is_camo_partner_connected allows checking the connection status of a CAMO partner node configured in Pair mode. There currently is no equivalent for CAMO used in combination with Eager Replication.

Synopsis

bdr.is_camo_partner_connected()

Return value

A boolean value indicating whether the CAMO partner is currently connected to a WAL sender process on the local node and therefore able to receive transactional data and send back confirmations.

CAMO partner readiness

The function bdr.is_camo_partner_ready allows checking the readiness status of a CAMO partner node configured in Pair mode. Underneath, this is what's used to trigger the switch to and from Local mode.

Synopsis

bdr.is_camo_partner_ready()

Return value

A boolean value indicating whether the CAMO partner can reasonably be expected to confirm transactions originating from the local node in a timely manner (i.e. before bdr.global_commit_timeout expires).

Note

Note that this function queries the past or current state. A positive return value is no indication for the CAMO partner being able to confirm future transactions.

Fetch the CAMO partner

This function shows the local node's CAMO partner (configured via Pair mode).

bdr.get_configured_camo_partner()

Wait for consumption of the apply queue from the CAMO partner

The function bdr.wait_for_camo_partner_queue is a wrapper of bdr.wait_for_apply_queue defaulting to query the CAMO partner node. It yields an error if the local node is not part of a CAMO pair.

Synopsis

bdr.wait_for_camo_partner_queue()

Transaction status between CAMO nodes

This function enables a wait for CAMO transactions to be fully resolved.

bdr.camo_transactions_resolved()

Transaction status query function

The application should use the function:

bdr.logical_transaction_status(node_id, xid, require_camo_partner)

...to check the status of a transaction which was being committed when the node failed.

With CAMO used in Pair mode, this function should only ever be used on a node that's part of a CAMO pair. In combination with Eager Replication, it may be used on all nodes.

In both cases, the function needs to be called within 15 minutes after the commit was issued, as the CAMO partner needs to regularly purge such meta-information and therefore cannot provide correct answers for older transactions.

Prior to querying the status of a transaction, this function waits for the receive queue to be consumed and fully applied. This prevents early negative answers for transactions that have already been received, but not applied, yet.

Note that despite its name, it is not always a read-only operation. If the status is unknown, the CAMO partner will decide whether to commit or abort the transaction, storing that decision locally to ensure consistency going forward.

Also note that the client must not call this function before attempting to commit on the origin, otherwise the transaction may be forced to be rolled back.

Synopsis

bdr.logical_transaction_status(node_id OID,
                               xid     OID,
                               require_camo_partner BOOL DEFAULT true)

Parameters

  • node_id - the node id of the BDR node the transaction originates from, usually retrieved by the client before COMMIT from the PQ parameter bdr.local_node_id.
  • xid - the transaction id on the origin node, usually retrieved by the client before COMMIT from the PQ parameter transaction_id (requires enable_camo to be set to on, remote_write, remote_commit_async, or remote_commit_flush. See Commit at Most Once Settings)
  • require_camo_partner - defaults to true and enables configuration checks; may be set to false to disable these checks and query the status of a transaction that was protected by Eager All Node Replication.

Return value

The function will return one of these results:

  • 'committed'::TEXT - the transaction has been committed, is visible on both nodes of the CAMO pair and will eventually be replicated to all other BDR nodes. No need for the client to retry it.

  • 'aborted'::TEXT - the transaction has been aborted and will not be replicated to any other BDR node. The client needs to either retry it or escalate the failure to commit the transaction.

  • 'in progress'::TEXT - the transaction is still in progress on this local node and has neither been committed nor aborted, yet. Note that the transaction may well be in the COMMIT phase, waiting for the CAMO partner to confirm or deny the commit. The recommended client reaction is to disconnect from the origin node and reconnect to the CAMO partner to query that instead. See the isTransactionCommitted method of the reference clients. With a load balancer or proxy in between, where the client lacks control over which node gets queried, the client can only poll repeatedly until the status switches to either 'committed' or 'aborted'.

    For Eager All Node Replication, peer nodes yield this result for transactions that are not yet committed or aborted. This means that even transactions not yet replicated (or not even started on the origin node) may yield an in progress result on a peer BDR node in this case. However, the client must not query the transaction status prior to attempting to commit on the origin.

  • 'unknown'::TEXT - the transaction specified is unknown, either because it is in the future, not replicated to that specific node yet, or too far in the past. The status of such a transaction is not yet or no longer known. This return value is a sign of improper use by the client.

The client must be prepared to retry the function call on error.

Connection pools and proxies

The effect of connection pools and proxies needs to be considered when designing a CAMO cluster. A proxy may freely distribute transactions to all nodes in the commit group (i.e. to both nodes of a CAMO pair or to all BDR nodes in case of Eager All Node Replication).

Care needs to be taken to ensure that the application fetches the proper node id: when using session pooling, the client remains connected to the same node, so the node id remains constant for the lifetime of the client session. However, with finer-grained transaction pooling, the client needs to fetch the node id for every transaction (as in the example given below).

A client that is not directly connected to the BDR nodes might not even notice a failover or switchover, but can always use the bdr.local_node_id parameter to determine which node it is currently connected to. In the crucial situation of a disconnect during COMMIT, the proxy must properly forward that disconnect as an error to the client applying the CAMO protocol.

For CAMO in remote_write mode, a proxy that potentially switches between the CAMO pairs must use the bdr.wait_for_camo_partner_queue function to prevent stale reads.

HARP is the only proxy that supports all of the above requirements. PgBouncer and HAproxy can work with CAMO, but do not support CAMO's remote_write mode.

Example

The following example demonstrates what a retry loop of a CAMO aware client application should look like in C-like pseudo-code. It expects two DSNs origin_dsn and partner_dsn providing connection information. These usually are the same DSNs as used for the initial call to bdr.create_node, and can be looked up in bdr.node_summary, column interface_connstr.

PGconn *conn = PQconnectdb(origin_dsn);

loop {
    // start a transaction
    PQexec(conn, "BEGIN");

    // apply transactional changes
    PQexec(conn, "INSERT INTO ...");
    ...

    // store a globally unique transaction identifier
    node_id = PQparameterStatus(conn, "bdr.local_node_id");
    xid = PQparameterStatus(conn, "transaction_id");

    // attempt to commit
    PQexec(conn, "COMMIT");
    if (PQresultStatus(res) == PGRES_COMMAND_OK)
        return SUCCESS;
    else if (PQstatus(res) == CONNECTION_BAD)
    {
        // Re-connect to the partner
        conn = PQconnectdb(partner_dsn);
        // Check if successfully reconnected
        if (!connectionEstablished())
            panic();

        // Check the attempted transaction's status
        sql = "SELECT bdr.logical_transaction_status($node_id, $xid)";
        txn_status = PQexec(conn, sql);
        if (txn_status == "committed")
            return SUCCESS;
        else
            continue;   // to retry the transaction on the partner
    }
    else
    {
        // The connection is intact, but the transaction failed for some
        // other reason.  Differentiate between permanent and temporary
        // errors.
        if (isPermanentError())
            return FAILURE;
        else
        {
            // Determine an appropriate delay to back-off to account for
            // temporary failures due to congestion, so as to decrease
            // the overall load put on the servers.
            sleep(increasing_retry_delay);

            continue;
        }
    }
}

This example needs to be extended with proper logic for connecting, including retries and error handling. If using a load balancer (e.g. PgBouncer), re-connecting can be implemented by simply using PQreset. Ensure that the load balancer only ever redirects a client to a CAMO partner and not any other BDR node.

In practice, an upper limit of retries is recommended. Depending on the actions performed in the transaction, other temporary errors may be possible and need to be handled by retrying the transaction depending on the error code, similarly to the best practices on deadlocks or on serialization failures while in SERIALIZABLE isolation mode.

Please see the reference client implementations provided as part of this documentation.

Interaction with DDL and global locks

Transactions protected by CAMO may contain DDL operations. Note however that DDL uses global locks, which already provide some synchronization among nodes; see DDL Locking Details for more information.

Combining CAMO with DDL not only imposes a higher latency, but also increases the chance of global deadlocks. We therefore recommend using a relatively low bdr.global_lock_timeout, which aborts the DDL and therefore resolves a deadlock in a reasonable amount of time.

Non-transactional DDL

The following DDL operations are not allowed within a transaction block and therefore cannot possibly benefit from CAMO protection. For these, CAMO is automatically disabled internally:

  • all concurrent index operations (CREATE, DROP, and REINDEX)
  • REINDEX DATABASE, REINDEX SCHEMA, and REINDEX SYSTEM
  • VACUUM
  • CLUSTER without any parameter
  • ALTER TABLE DETACH PARTITION CONCURRENTLY
  • ALTER TYPE [enum] ADD VALUE
  • ALTER SYSTEM
  • CREATE and DROP DATABASE
  • CREATE and DROP TABLESPACE
  • ALTER DATABASE [db] TABLESPACE

CAMO Limitations

CAMO is designed to query the results of a recently failed COMMIT on the origin node, so in case of disconnection, the application should be coded to immediately request the transaction status from the CAMO partner. There should be as little delay as possible after the failure before requesting the status. Applications should not rely on CAMO decisions being stored for longer than 15 minutes.

If the application forgets the global identifier assigned, for example as a result of a restart, there is no easy way to recover that. Therefore, it is recommended applications wait for outstanding transactions to terminate before shutting down.

For the client to apply proper checks, a transaction protected by CAMO cannot be a single statement with implicit transaction control. Nor is it possible to use CAMO with a transaction-controlling procedure or within a DO block that tries to start or end transactions.

Changing the CAMO partners in a CAMO pair is not currently possible. It's only possible to add or remove a pair. Adding or removing a pair does not need a restart of Postgres or even a reload of the configuration.

CAMO resolves commit status but does not yet resolve pending notifications on commit. CAMO and Eager replication options do not allow the NOTIFY SQL command or the pg_notify() function, nor do they allow LISTEN or UNLISTEN.

Replacing a crashed and unrecoverable BDR node with its physical standby is not currently supported in combination with CAMO-protected transactions.

Also, CAMO does not currently work together with the Decoding Worker. Installations using CAMO must keep enable_wal_decoder disabled for the BDR node group using CAMO.

Legacy BDR synchronous replication uses a mechanism for transaction confirmation different from CAMO. The two are not compatible and must not be used together. Therefore, a CAMO partner must not be configured in synchronous_standby_names. Using synchronous replication to a non-BDR node acting as a physical standby is well possible.

When replaying changes, CAMO transactions may detect conflicts just the same as other transactions. If timestamp conflict detection is used, the CAMO transaction uses the timestamp of the prepare on the origin node, which is before the transaction becomes visible on the origin node itself.

Performance Implications

CAMO extends Postgres' replication protocol by adding an additional message round trip at commit. Applications should expect a higher commit latency than with asynchronous replication, mostly determined by the round trip time between involved nodes. Increasing the number of concurrent sessions can help to increase parallelism to still obtain reasonable transaction throughput.

The CAMO partner confirming transactions needs to store transaction states. Again, compared to non-CAMO operation, this might require an additional seek for each transaction applied from the origin.

Client Application Testing

Proper use of CAMO on the client side is not trivial; we strongly recommend testing the application behavior in combination with the BDR cluster against failure scenarios such as node crashes or network outages.