Conflicts v4

BDR is an active/active or multi-master DBMS. If used asynchronously, writes to the same or related rows from multiple different nodes can result in data conflicts when using standard data types.

Conflicts aren't errors. In most cases, they are events that BDR can detect and resolve as they occur. Resolution depends on the nature of the application and the meaning of the data, so it's important that BDR provides the application a range of choices as to how to resolve conflicts.

By default, conflicts are resolved at the row level. When changes from two nodes conflict, either the local or remote tuple is picked and the other is discarded. For example, the commit timestamps might be compared for the two conflicting changes and the newer one kept. This approach ensures that all nodes converge to the same result and establishes commit-order-like semantics on the whole cluster.

Conflict handling is configurable, as described in Conflict resolution. Conflicts can be detected and handled differently for each table using conflict triggers, described in Stream triggers.

Column-level conflict detection and resolution is available with BDR, described in CLCD.

If you want to avoid conflicts, you can use these features in BDR.

By default, all conflicts are logged to bdr.conflict_history. If conflicts are possible, then table owners must monitor for them and analyze how to avoid them or make plans to handle them regularly as an application task. The LiveCompare tool is also available to scan regularly for divergence.

Some clustering systems use distributed lock mechanisms to prevent concurrent access to data. These can perform reasonably when servers are very close to each other but can't support geographically distributed applications where very low latency is critical for acceptable performance.

Distributed locking is essentially a pessimistic approach. BDR advocates an optimistic approach, which is to avoid conflicts where possible but allow some types of conflicts to occur and resolve them when they arise.

Upgrade Notes

All the SQL-visible interfaces are in the bdr schema. All the previously deprecated interfaces in the bdr_conflicts or bdr_crdt schema were removed and don't work on 3.7+ nodes or in groups that contain at least one 3.7+ node. Use the ones in the bdr schema that are already present in all BDR versions.

How conflicts happen

Inter-node conflicts arise as a result of sequences of events that can't happen if all the involved transactions happen concurrently on the same node. Because the nodes exchange changes only after the transactions commit, each transaction is individually valid on the node it committed on. It isn't valid if applied on another node that did other conflicting work at the same time.

Since BDR replication essentially replays the transaction on the other nodes, the replay operation can fail if there's a conflict between a transaction being applied and a transaction that was committed on the receiving node.

Most conflicts can't happen when all transactions run on a single node because Postgres has inter-transaction communication mechanisms to prevent it such as UNIQUE indexes, SEQUENCE operations, row and relation locking, and SERIALIZABLE dependency tracking. All of these mechanisms are ways to communicate between ongoing transactions to prevent undesirable concurrency issues.

BDR doesn't have a distributed transaction manager or lock manager. That's part of why it performs well with latency and network partitions. As a result, transactions on different nodes execute entirely independently from each other when using the default, lazy replication. Less independence between nodes can avoid conflicts altogether, which is why BDR also offers Eager Replication for when this is important.

Types of conflict

PRIMARY KEY or UNIQUE conflicts

The most common conflicts are row conflicts, where two operations affect a row with the same key in ways they can't on a single node. BDR can detect most of those and applies the update_if_newer conflict resolver.

Row conflicts include:

  • INSERT versus INSERT
  • UPDATE versus UPDATE
  • UPDATE versus DELETE
  • INSERT versus UPDATE
  • INSERT versus DELETE
  • DELETE versus DELETE

The view bdr.node_conflict_resolvers provides information on how conflict resolution is currently configured for all known conflict types.

INSERT/INSERT conflicts

The most common conflict, INSERT/INSERT, arises where INSERT operations on two different nodes create a tuple with the same PRIMARY KEY values (or if no PRIMARY KEY exists, the same values for a single UNIQUE constraint).

BDR handles this situation by retaining the most recently inserted tuple of the two according to the originating node's timestamps, unless this behavior is overridden by a user-defined conflict handler.

This conflict generates the insert_exists conflict type, which is by default resolved by choosing the newer (based on commit time) row and keeping only that one (update_if_newer resolver). You can configure other resolvers. See Conflict resolution for details.

To resolve this conflict type, you can also use column-level conflict resolution and user-defined conflict triggers.

You can effectively eliminate this type of conflict by using global sequences.

INSERT operations that violate multiple UNIQUE constraints

An INSERT/INSERT conflict can violate more than one UNIQUE constraint (of which one might be the PRIMARY KEY). If a new row violates more than one UNIQUE constraint and that results in a conflict against more than one other row, then the apply of the replication change produces a multiple_unique_conflicts conflict.

In case of such a conflict, you must remove some rows for replication to continue. Depending on the resolver setting for multiple_unique_conflicts, the apply process either exits with error, skips the incoming row, or deletes some of the rows. The deletion tries to preserve the row with the correct PRIMARY KEY and delete the others.

Warning

In case of multiple rows conflicting this way, if the result of conflict resolution is to proceed with the insert operation, some of the data is always deleted.

It's also possible to define a different behavior using a conflict trigger.

UPDATE/UPDATE conflicts

Where two concurrent UPDATE operations on different nodes change the same tuple (but not its PRIMARY KEY), an UPDATE/UPDATE conflict can occur on replay.

These can generate different conflict kinds based on the configuration and situation. If the table is configured with row version conflict detection, then the original (key) row is compared with the local row. If they're different, the update_differing conflict is generated. When using [Origin conflict detection], the origin of the row is checked (the origin is the node that the current local row came from). If that changed, the update_origin_change conflict is generated. In all other cases, the UPDATE is normally applied without generating a conflict.

Both of these conflicts are resolved the same way as insert_exists, described in INSERT/INSERT conflicts.

UPDATE conflicts on the PRIMARY KEY

BDR can't currently perform conflict resolution where the PRIMARY KEY is changed by an UPDATE operation. You can update the primary key, but you must ensure that no conflict with existing values is possible.

Conflicts on the update of the primary key are Divergent conflicts and require manual intervention.

Updating a primary key is possible in Postgres, but there are issues in both Postgres and BDR.

A simple schema provides an example that explains:

CREATE TABLE pktest (pk integer primary key, val integer);
INSERT INTO pktest VALUES (1,1);

Updating the Primary Key column is possible, so this SQL succeeds:

UPDATE pktest SET pk=2 WHERE pk=1;

However, suppose there are multiple rows in the table:

INSERT INTO pktest VALUES (3,3);

Some UPDATEs succeed:

UPDATE pktest SET pk=4 WHERE pk=3;

SELECT * FROM pktest;
 pk | val
----+-----
  2 |   1
  4 |   3
(2 rows)

Other UPDATEs fail with constraint errors:

UPDATE pktest SET pk=4 WHERE pk=2;
ERROR:  duplicate key value violates unique constraint "pktest_pkey"
DETAIL:  Key (pk)=(4) already exists

So for Postgres applications that update primary keys, be careful to avoid runtime errors, even without BDR.

With BDR, the situation becomes more complex if UPDATEs are allowed from multiple locations at same time.

Executing these two changes concurrently works:

node1: UPDATE pktest SET pk=pk+1 WHERE pk = 2;
node2: UPDATE pktest SET pk=pk+1 WHERE pk = 4;

SELECT * FROM pktest;
 pk | val
----+-----
  3 |   1
  5 |   3
(2 rows)

Executing these next two changes concurrently causes a divergent error, since both changes are accepted. But applying the changes on the other node results in update_missing conflicts.

node1: UPDATE pktest SET pk=1 WHERE pk = 3;
node2: UPDATE pktest SET pk=2 WHERE pk = 3;

This scenario leaves the data different on each node:

node1:
SELECT * FROM pktest;
 pk | val
----+-----
  1 |   1
  5 |   3
(2 rows)

node2:
SELECT * FROM pktest;
 pk | val
----+-----
  2 |   1
  5 |   3
(2 rows)

You can identify and resolve this situation using LiveCompare.

Concurrent conflicts present problems. Executing these two changes concurrently isn't easy to resolve:

node1: UPDATE pktest SET pk=6, val=8 WHERE pk = 5;
node2: UPDATE pktest SET pk=6, val=9 WHERE pk = 5;

Both changes are applied locally, causing a divergence between the nodes. But then apply on the target fails on both nodes with a duplicate key-value violation error, which causes the replication to halt and requires manual resolution.

This duplicate key violation error can now be avoided, and replication doesn't break if you set the conflict_type update_pkey_exists to skip, update, or update_if_newer. This can still lead to divergence depending on the nature of the update.

You can avoid divergence in cases where the same old key is being updated by the same new key concurrently by setting update_pkey_exists to update_if_newer. However, in certain situations, divergence occurs even with update_if_newer, namely when two different rows both are updated concurrently to the same new primary key.

As a result, we recommend strongly against allowing primary key UPDATE operations in your applications, especially with BDR. If parts of your application change primary keys, then to avoid concurrent changes, make those changes using Eager Replication.

Warning

In case the conflict resolution of update_pkey_exists conflict results in update, one of the rows is always deleted.

UPDATE operations that violate multiple UNIQUE constraints

Like INSERT operations that violate multiple UNIQUE constraints, where an incoming UPDATE violates more than one UNIQUE index (or the PRIMARY KEY), BDR raises a multiple_unique_conflicts conflict.

BDR supports deferred unique constraints. If a transaction can commit on the source, then it applies cleanly on target, unless it sees conflicts. However, a deferred primary key can't be used as a REPLICA IDENTITY, so the use cases are already limited by that and the warning about using multiple unique constraints.

UPDATE/DELETE conflicts

It's possible for one node to update a row that another node simultaneously deletes. In this case an UPDATE/DELETE conflict can occur on replay.

If the deleted row is still detectable (the deleted row wasn't removed by VACUUM), the update_recently_deleted conflict is generated. By default the UPDATE is skipped, but you can configure the resolution for this. See Conflict resolution for details.

The deleted row can be cleaned up from the database by the time the UPDATE is received in case the local node is lagging behind in replication. In this case, BDR can't differentiate between UPDATE/DELETE conflicts and INSERT/UPDATE conflicts and generates the update_missing conflict.

Another type of conflicting DELETE and UPDATE is a DELETE that comes after the row was updated locally. In this situation, the outcome depends on the type of conflict detection used. When using the default, origin conflict detection, no conflict is detected at all, leading to the DELETE being applied and the row removed. If you enable row version conflict detection, a delete_recently_updated conflict is generated. The default resolution for this conflict type is to apply the DELETE and remove the row, but you can configure this or this can be handled by a conflict trigger.

INSERT/UPDATE conflicts

When using the default asynchronous mode of operation, a node might receive an UPDATE of a row before the original INSERT was received. This can happen only with three or more nodes being active (see Conflicts with three or more nodes).

When this happens, the update_missing conflict is generated. The default conflict resolver is insert_or_skip, though you can use insert_or_error or skip instead. Resolvers that do insert-or-action first try to INSERT a new row based on data from the UPDATE when possible (when the whole row was received). For the reconstruction of the row to be possible, the table either needs to have REPLICA IDENTITY FULL or the row must not contain any toasted data.

See TOAST support details for more info about toasted data.

INSERT/DELETE conflicts

Similar to the INSERT/UPDATE conflict, the node might also receive a DELETE operation on a row for which it didn't yet receive an INSERT. This is again possible only with three or more nodes set up (see Conflicts with three or more nodes).

BDR can't currently detect this conflict type. The INSERT operation doesn't generate any conflict type and the INSERT is applied.

The DELETE operation always generates a delete_missing conflict, which is by default resolved by skipping the operation.

DELETE/DELETE conflicts

A DELETE/DELETE conflict arises when two different nodes concurrently delete the same tuple.

This always generates a delete_missing conflict, which is by default resolved by skipping the operation.

This conflict is harmless since both DELETE operations have the same effect. One of them can be safely ignored.

Conflicts with three or more nodes

If one node inserts a row that is then replayed to a second node and updated there, a third node can receive the UPDATE from the second node before it receives the INSERT from the first node. This scenario is an INSERT/UPDATE conflict.

These conflicts are handled by discarding the UPDATE. This can lead to different data on different nodes. These are [divergent conflicts](#divergent conflicts).

This conflict type can happen only with three or more masters, of which at least two must be actively writing.

Also, the replication lag from node 1 to node 3 must be high enough to allow the following sequence of actions:

  1. node 2 receives INSERT from node 1
  2. node 2 performs UPDATE
  3. node 3 receives UPDATE from node 2
  4. node 3 receives INSERT from node 1

Using insert_or_error (or in some cases the insert_or_skip conflict resolver for the update_missing conflict type) is a viable mitigation strategy for these conflicts. However, enabling this option opens the door for INSERT/DELETE conflicts:

  1. node 1 performs UPDATE
  2. node 2 performs DELETE
  3. node 3 receives DELETE from node 2
  4. node 3 receives UPDATE from node 1, turning it into an INSERT

If these are problems, we recommend tuning freezing settings for a table or database so that they are correctly detected as update_recently_deleted.

Another alternative is to use Eager Replication to prevent these conflicts.

INSERT/DELETE conflicts can also occur with three or more nodes. Such a conflict is identical to INSERT/UPDATE except with the UPDATE replaced by a DELETE. This can result in a delete_missing conflict.

BDR could choose to make each INSERT into a check-for-recently deleted, as occurs with an update_missing conflict. However, the cost of doing this penalizes the majority of users, so at this time it simply logs delete_missing.

Later releases will automatically resolve INSERT/DELETE anomalies via rechecks using LiveCompare when delete_missing conflicts occur. These can be performed manually by applications by checking the bdr.conflict_history_summary view.

These conflicts can occur in two main problem use cases:

  • INSERT followed rapidly by a DELETE, as can be used in queuing applications
  • Any case where the primary key identifier of a table is reused

Neither of these cases is common. We recommend not replicating the affected tables if these problem use cases occur.

BDR has problems with the latter case because BDR relies on the uniqueness of identifiers to make replication work correctly.

Applications that insert, delete, and then later reuse the same unique identifiers can cause difficulties. This is known as the ABA problem. BDR has no way of knowing whether the rows are the current row, the last row, or much older rows.

Unique identifier reuse is also a business problem, since it is prevents unique identification over time, which prevents auditing, traceability, and sensible data quality. Applications don't need to reuse unique identifiers.

Any identifier reuse that occurs in the time interval it takes for changes to pass across the system causes difficulties. Although that time might be short in normal operation, down nodes can extend that interval to hours or days.

We recommend that applications don't reuse unique identifiers, but if they do, take steps to avoid reuse within a period of less than a year.

This problem doesn't occur in applications that use sequences or UUIDs.

Foreign key constraint conflicts

Conflicts between a remote transaction being applied and existing local data can also occur for FOREIGN KEY (FK) constraints.

BDR applies changes with session_replication_role = 'replica', so foreign keys aren't rechecked when applying changes. In an active/active environment, this can result in FK violations if deletes occur to the referenced table at the same time as inserts into the referencing table. This is similar to an INSERT/DELETE conflict.

In single-master Postgres, any INSERT/UPDATE that refers to a value in the referenced table must wait for DELETE operations to finish before they can gain a row-level lock. If a DELETE removes a referenced value, then the INSERT/UPDATE fails the FK check.

In multi-master BDR. there are no inter-node row-level locks. An INSERT on the referencing table doesn't wait behind a DELETE on the referenced table, so both actions can occur concurrently. Thus an INSERT/UPDATE on one node on the referencing table can use a value at the same time as a DELETE on the referenced table on another node. This then results in a value in the referencing table that's no longer present in the referenced table.

In practice, this occurs if the DELETE operations occurs on referenced tables in separate transactions from DELETE operations on referencing tables. This isn't a common operation.

In a parent-child relationship such as Orders -> OrderItems, it isn't typical to do this. It's more likely to mark an OrderItem as canceled than to remove it completely. For reference/lookup data, it's unusual to completely remove entries at the same time as using those same values for new fact data.

While there's a possibility of dangling FKs, the risk of this in general is very low and so BDR doesn't impose a generic solution to cover this case. Once you understand the situation in which this occurs, two solutions are possible.

The first solution is to restrict the use of FKs to closely related entities that are generally modified from only one node at a time, are infrequently modified, or where the modification's concurrency is application-mediated. This avoids any FK violations at the application level.

The second solution is to add triggers to protect against this case using the BDR-provided functions bdr.ri_fkey_trigger() and bdr.ri_fkey_on_del_trigger(). When called as BEFORE triggers, these functions use FOREIGN KEY information to avoid FK anomalies by setting referencing columns to NULL, much as if you had a SET NULL constraint. This rechecks all FKs in one trigger, so you need to add only one trigger per table to prevent FK violation.

As an example, suppose you have two tables: Fact and RefData. Fact has an FK that references RefData. Fact is the referencing table and RefData is the referenced table. You need to add one trigger to each table.

Add a trigger that sets columns to NULL in Fact if the referenced row in RefData was already deleted.

CREATE TRIGGER bdr_replica_fk_iu_trg
    BEFORE INSERT OR UPDATE ON fact
    FOR EACH ROW
    EXECUTE PROCEDURE bdr.ri_fkey_trigger();

ALTER TABLE fact
    ENABLE REPLICA TRIGGER bdr_replica_fk_iu_trg;

Add a trigger that sets columns to NULL in Fact at the time a DELETE occurs on the RefData table.

CREATE TRIGGER bdr_replica_fk_d_trg
    BEFORE DELETE ON refdata
    FOR EACH ROW
    EXECUTE PROCEDURE bdr.ri_fkey_on_del_trigger();

ALTER TABLE refdata
    ENABLE REPLICA TRIGGER bdr_replica_fk_d_trg;

Adding both triggers avoids dangling foreign keys.

TRUNCATE conflicts

TRUNCATE behaves similarly to a DELETE of all rows but performs this action by physically removing the table data rather than row-by-row deletion. As a result, row-level conflict handling isn't available, so TRUNCATE commands don't generate conflicts with other DML actions, even when there's a clear conflict.

As a result, the ordering of replay can cause divergent changes if another DML is executed concurrently on other nodes to the TRUNCATE.

You can take one of the following actions:

  • Ensure TRUNCATE isn't executed alongside other concurrent DML. Rely on LiveCompare to highlight any such inconsistency.

  • Replace TRUNCATE with a DELETE statement with no WHERE clause. This approach is likely to have very poor performance on larger tables.

  • Set bdr.truncate_locking = 'on' to set the TRUNCATE command’s locking behavior. This setting determines whether TRUNCATE obeys the bdr.ddl_locking setting. This isn't the default behavior for TRUNCATE since it requires all nodes to be up. This configuration might not be possible or wanted in all cases.

Exclusion constraint conflicts

BDR doesn't support exclusion constraints and prevents their creation.

If an existing standalone database is converted to a BDR database, then drop all exclusion constraints manually.

In a distributed asynchronous system, you can't ensure that no set of rows that violate the constraint exists, because all transactions on different nodes are fully isolated. Exclusion constraints lead to replay deadlocks where replay can't progress from any node to any other node because of exclusion constraint violations.

If you force BDR to create an exclusion constraint, or you don't drop existing ones when converting a standalone database to BDR, expect replication to break. To get it to progress again, remove or alter the local tuples that an incoming remote tuple conflicts with so that the remote transaction can be applied.

Data conflicts for roles and tablespace differences

Conflicts can also arise where nodes have global (Postgres-system-wide) data, like roles, that differ. This can result in operationsmainly DDLthat can run successfully and commit on one node but then fail to apply to other nodes.

For example, node1 might have a user named fred, and that user wasn't created on node2. If fred on node1 creates a table, the table is replicated with its owner set to fred. When the DDL command is applied to node2, the DDL fails because there's no user named fred. This failure emits an error in the Postgres logs.

Administrator intervention is required to resolve this conflict by creating the user fred in the database where BDR is running. You can set bdr.role_replication = on to resolve this in future.

Lock conflicts and deadlock aborts

Because BDR writer processes operate much like normal user sessions, they're subject to the usual rules around row and table locking. This can sometimes lead to BDR writer processes waiting on locks held by user transactions or even by each other.

Relevant locking includes:

  • Explicit table-level locking (LOCK TABLE ...) by user sessions
  • Explicit row-level locking (SELECT ... FOR UPDATE/FOR SHARE) by user sessions
  • Implicit locking because of row UPDATE, INSERT, or DELETE operations, either from local activity or from replication from other nodes

A BDR writer process can deadlock with a user transaction, where the user transaction is waiting on a lock held by the writer process and vice versa. Two writer processes can also deadlock with each other. Postgres's deadlock detector steps in and terminates one of the problem transactions. If the BDR writer process is terminated, it retries and generally succeeds.

All these issues are transient and generally require no administrator action. If a writer process is stuck for a long time behind a lock on an idle user session, the administrator can terminate the user session to get replication flowing again, but this is no different from a user holding a long lock that impacts another user session.

Use of the log_lock_waits facility in Postgres can help identify locking related replay stalls.

Divergent conflicts

Divergent conflicts arise when data that should be the same on different nodes differs unexpectedly. Divergent conflicts should not occur, but not all such conflicts can be reliably prevented at the time of writing.

Changing the PRIMARY KEY of a row can lead to a divergent conflict if another node changes the key of the same row before all nodes have replayed the change. Avoid changing primary keys, or change them only on one designated node.

Divergent conflicts involving row data generally require administrator action to manually adjust the data on one of the nodes to be consistent with the other one. Such conflicts don't arise so long as you use BDR as documented and avoid settings or functions marked as unsafe.

The administrator must manually resolve such conflicts. You might need to use the advanced options such as bdr.ddl_replication and bdr.ddl_locking depending on the nature of the conflict. However, careless use of these options can make things much worse and create a conflict that generic instructions can't address.

TOAST support details

Postgres uses out-of-line storage for larger columns called TOAST.

The TOAST values handling in logical decoding (which BDR is built on top of) and logical replication is different from inline data stored as part of the main row in the table.

The TOAST value is logged into the transaction log (WAL) only if the value has changed. This can cause problems, especially when handling UPDATE conflicts because an UPDATE statement that didn't change a value of a toasted column produces a row without that column. As mentioned in INSERT/UPDATE conflicts, BDR reports an error if an update_missing conflict is resolved using insert_or_error and there are missing TOAST columns.

However, there are more subtle issues than this one in case of concurrent workloads with asynchronous replication (Eager transactions aren't affected). Imagine, for example, the following workload on a BDR cluster with three nodes called A, B, and C:

  1. On node A: txn A1 does an UPDATE SET col1 = 'toast data...' and commits first.
  2. On node B: txn B1 does UPDATE SET other_column = 'anything else'; and commits after A1.
  3. On node C: the connection to node A lags behind.
  4. On node C: txn B1 is applied first, it misses the TOASTed column in col1, but gets applied without conflict.
  5. On node C: txn A1 conflicts (on update_origin_change) and is skipped.
  6. Node C misses the toasted data from A1 forever.

This scenario isn't usually a problem when using BDR. (It is when using either built-in logical replication or plain pglogical for multi-master.) BDR adds its own logging of TOAST columns when it detects a local UPDATE to a row that recently replicated a TOAST column modification and the local UPDATE isn't modifying the TOAST. Thus BDR prevents any inconsistency for toasted data across different nodes. This situation causes increased WAL logging when updates occur on multiple nodes (that is, when origin changes for a tuple). Additional WAL overhead is zero if all updates are made from a single node, as is normally the case with BDR AlwaysOn architecture.

Note

Running VACUUM FULL or CLUSTER on just the TOAST table without also doing same on the main table removes metadata needed for the extra logging to work. This means that, for a short period of time after such a statement, the protection against these concurrency issues isn't be present.

Warning

The additional WAL logging of TOAST is done using the BEFORE UPDATE trigger on standard Postgres. This trigger must be sorted alphabetically last (based on trigger name) among all BEFORE UPDATE triggers on the table. It's prefixed with zzzz_bdr_ to make this easier, but make sure you don't create any trigger with a name that sorts after it. Otherwise you won't have the protection against the concurrency issues.

For the insert_or_error conflict resolution, the use of REPLICA IDENTITY FULL is, however, still required.

None of these problems associated with toasted columns affect tables with REPLICA IDENTITY FULL. This setting always logs a toasted value as part of the key since the whole row is considered to be part of the key. BDR can reconstruct the new row, filling the missing data from the key row. As a result, using REPLICA IDENTITY FULL can increase WAL size significantly.

Avoiding or tolerating conflicts

In most cases, you can design the application to avoid or tolerate conflicts.

Conflicts can happen only if things are happening at the same time on multiple nodes. The simplest way to avoid conflicts is to only ever write to one node or to only ever write to a specific row in a specific way from one specific node at a time.

This happens naturally in many applications. For example, many consumer applications allow data to be changed only by the owning user, such as changing the default billing address on your account. Such data changes seldom have update conflicts.

You might make a change just before a node goes down, so the change seems to be lost. You might then make the same change again, leading to two updates on different nodes. When the down node comes back up, it tries to send the older change to other nodes, but it's rejected because the last update of the data is kept.

For INSERT/INSERT conflicts, use global sequences to prevent this type of conflict.

For applications that assign relationships between objects, such as a room booking application, applying update_if_newer might not give an acceptable business outcome. That is, it isn't useful to confirm to two people separately that they have booked the same room. The simplest resolution is to use Eager Replication to ensure that only one booking succeeds. More complex ways might be possible depending on the application. For example, you can assign 100 seats to each node and allow those to be booked by a writer on that node. But if none are available locally, use a distributed locking scheme or Eager Replication once most seats are reserved.

Another technique for ensuring certain types of updates occur only from one specific node is to route different types of transactions through different nodes. For example:

  • Receiving parcels on one node but delivering parcels using another node
  • A service application where orders are input on one node, work is prepared on a second node, and then served back to customers on another

Frequently, the best course is to allow conflicts to occur and design the application to work with BDR's conflict resolution mechanisms to cope with the conflict.

Conflict detection

BDR provides these mechanisms for conflict detection:

Origin conflict detection

Origin conflict detection uses and relies on commit timestamps as recorded on the node the transaction originates from. This requires clocks to be in sync to work correctly or to be within a tolerance of the fastest message between two nodes. If this isn't the case, conflict resolution tends to favor the node that's further ahead. You can manage clock skew between nodes using the parameters bdr.maximum_clock_skew and bdr.maximum_clock_skew_action.

Row origins are available only if track_commit_timestamp = on.

Conflicts are initially detected based on whether the replication origin changed, so conflict triggers are called in situations that might turn out not to be conflicts. Hence, this mechanism isn't precise, since it can generate false-positive conflicts.

Origin info is available only up to the point where a row is frozen. Updates arriving for a row after it was frozen don't raise a conflict so are applied in all cases. This is the normal case when adding a new node by bdr_init_physical, so raising conflicts causes many false-positive results in that case.

When a node that was offline reconnects and begins sending data changes, this can cause divergent errors if the newly arrived updates are older than the frozen rows that they update. Inserts and deletes aren't affected by this situation.

We suggest that you don't leave down nodes for extended outages, as discussed in Node restart and down node recovery.

On EDB Postgres Extended Server and EDB Postgres Advanced Server, BDR holds back the freezing of rows while a node is down. This mechanism handles this situation gracefully so you don't need to change parameter settings.

On other variants of Postgres, you might need to manage this situation with some care.

Freezing normally occurs when a row being vacuumed is older than vacuum_freeze_min_age xids from the current xid, which means that you need to configure suitably high values for these parameters:

  • vacuum_freeze_min_age
  • vacuum_freeze_table_age
  • autovacuum_freeze_max_age

Choose values based on the transaction rate, giving a grace period of downtime before removing any conflict data from the database node. For example, when vacuum_freeze_min_age is set to 500 million, a node performing 1000 TPS can be down for just over 5.5 days before conflict data is removed. The CommitTS data structure takes on-disk space of 5 GB with that setting, so lower transaction rate systems can benefit from lower settings.

Initially recommended settings are:

# 1 billion = 10GB
autovacuum_freeze_max_age = 1000000000

vacuum_freeze_min_age = 500000000

# 90% of autovacuum_freeze_max_age
vacuum_freeze_table_age = 900000000

Note that:

  • You can set autovacuum_freeze_max_age only at node start.
  • You can set vacuum_freeze_min_age, so using a low value freezes rows early and can result in conflicts being ignored. You can also set autovacuum_freeze_min_age and toast.autovacuum_freeze_min_age for individual tables.
  • Running the CLUSTER or VACUUM FREEZE commands also freezes rows early and can result in conflicts being ignored.

Row version conflict detection

Alternatively, BDR provides the option to use row versioning and make conflict detection independent of the nodes' system clock.

Row version conflict detection requires that you enable three things. If any of these steps aren't performed correctly then origin conflict detection is used.

  1. check_full_tuple must be enabled for the BDR node group.

  2. REPLICA IDENTITY FULL must be enabled on all tables that use row version conflict detection.

  3. Row Version Tracking must be enabled on the table by using bdr.alter_table_conflict_detection. This function adds a column (with a name you specify) and an UPDATE trigger that manages the new column value. The column is created as INTEGER type.

Although the counter is incremented only on UPDATE, this technique allows conflict detection for both UPDATE and DELETE.

This approach resembles Lamport timestamps and fully prevents the ABA problem for conflict detection.

Note

The row-level conflict resolution is still handled based on the conflict resolution configuration even with row versioning. The way the row version is generated is useful only for detecting conflicts. Don't rely on it as authoritative information about which version of row is newer.

To determine the current conflict resolution strategy used for a specific table, refer to the column conflict_detection of the view bdr.tables.

bdr.alter_table_conflict_detection

Allows the table owner to change how conflict detection works for a given table.

Synopsis

bdr.alter_table_conflict_detection(relation regclass,
                                   method text,
                                   column_name name DEFAULT NULL)

Parameters

  • relation Name of the relation for which to set the new conflict detection method.
  • method The conflict detection method to use.
  • column_name The column to use for storing the column detection data. This can be skipped, in which case the column name is chosen based on the conflict detection method. The row_origin method doesn't require an extra column for metadata storage.

The recognized methods for conflict detection are:

  • row_origin Origin of the previous change made on the tuple (see Origin conflict detection). This is the only method supported that doesn't require an extra column in the table.
  • row_version Row version column (see Row version conflict detection).
  • column_commit_timestamp Per-column commit timestamps (described in CLCD).
  • column_modify_timestamp Per-column modification timestamp (described in CLCD).

Notes

For more information about the difference between column_commit_timestamp and column_modify_timestamp conflict detection methods, see Current versus commit timestamp.

This function uses the same replication mechanism as DDL statements. This means the replication is affected by the ddl filters configuration.

The function takes a DML global lock on the relation for which column-level conflict resolution is being enabled.

This function is transactional. You can roll back the effects back with the ROLLBACK of the transaction, and the changes are visible to the current transaction.

The bdr.alter_table_conflict_detection function can be executed only by the owner of the relation, unless bdr.backwards_compatibility is set to 30618 or below.

Warning

When changing the conflict detection method from one that uses an extra column to store metadata, that column is dropped.

Warning

This function disables CAMO (together with a warning, as long as these aren't disabled with bdr.camo_enable_client_warnings).

List of conflict types

BDR recognizes the following conflict types, which can be used as the conflict_type parameter:

  • insert_exists An incoming insert conflicts with an existing row via a primary key or a unique key/index.
  • update_differing An incoming update's key row differs from a local row. This can happen only when using row version conflict detection.
  • update_origin_change An incoming update is modifying a row that was last changed by a different node.
  • update_missing An incoming update is trying to modify a row that doesn't exist.
  • update_recently_deleted An incoming update is trying to modify a row that was recently deleted.
  • update_pkey_exists An incoming update has modified the PRIMARY KEY to a value that already exists on the node that's applying the change.
  • multiple_unique_conflicts The incoming row conflicts with multiple UNIQUE constraints/indexes in the target table.
  • delete_recently_updated An incoming delete with an older commit timestamp than the most recent update of the row on the current node, or when using [Row version conflict detection].
  • delete_missing An incoming delete is trying to remove a row that doesn't exist.
  • target_column_missing The target table is missing one or more columns present in the incoming row.
  • source_column_missing The incoming row is missing one or more columns that are present in the target table.
  • target_table_missing The target table is missing.
  • apply_error_ddl An error was thrown by Postgres when applying a replicated DDL command.

Conflict resolution

Most conflicts can be resolved automatically. BDR defaults to a last-update-wins mechanism or, more accurately, the update_if_newer conflict resolver. This mechanism retains the most recently inserted or changed row of the two conflicting ones based on the same commit timestamps used for conflict detection. The behavior in certain corner-case scenarios depends on the settings used for bdr.create_node_group and alternatively for bdr.alter_node_group.

BDR lets you override the default behavior of conflict resolution by using the following function:

bdr.alter_node_set_conflict_resolver

This function sets the behavior of conflict resolution on a given node.

Synopsis

bdr.alter_node_set_conflict_resolver(node_name text,
                                     conflict_type text,
                                     conflict_resolver text)

Parameters

  • node_name Name of the node that's being changed.
  • conflict_type Conflict type for which to apply the setting (see List of conflict types).
  • conflict_resolver Resolver to use for the given conflict type (see List of conflict resolvers).

Notes

Currently you can change only the local node. The function call isn't replicated. If you want to change settings on multiple nodes, you must run the function on each of them.

The configuration change made by this function overrides any default behavior of conflict resolutions specified by bdr.create_node_group or bdr.alter_node_group.

This function is transactional. You can roll back the changes, and they are visible to the current transaction.

List of conflict resolvers

Several conflict resolvers are available in BDR, with differing coverages of the conflict types they can handle:

  • error Throws error and stops replication. Can be used for any conflict type.
  • skip Skips processing the remote change and continues replication with the next change. Can be used for insert_exists, update_differing, update_origin_change, update_missing, update_recently_deleted, update_pkey_exists, delete_recently_updated, delete_missing, target_table_missing, target_column_missing, and source_column_missing conflict types.
  • skip_if_recently_dropped Skip the remote change if it's for a table that doesn't exist downstream because it was recently (within one day) dropped on the downstream; throw an error otherwise. Can be used for the target_table_missing conflict type. skip_if_recently_dropped conflict resolver can pose challenges if a table with the same name is re-created shortly after it's dropped. In that case, one of the nodes might see the DMLs on the re-created table before it sees the DDL to re-create the table. It then incorrectly skips the remote data, assuming that the table is recently dropped, and causes data loss. We hence recommend that you don't reuse the object namesq immediately after they are dropped along with this conflict resolver.
  • skip_transaction Skips the whole transaction that generated the conflict. Can be used for apply_error_ddl conflict.
  • update_if_newer Update if the remote row was committed later (as determined by the wall clock of the originating node) than the conflicting local row. If the timestamps are same, the node id is used as a tie-breaker to ensure that same row is picked on all nodes (higher nodeid wins). Can be used for insert_exists, update_differing, update_origin_change, and update_pkey_exists conflict types.
  • update Always perform the replicated action. Can be used for insert_exists (turns the INSERT into UPDATE), update_differing, update_origin_change, update_pkey_exists, and delete_recently_updated (performs the delete).
  • insert_or_skip Try to build a new row from available information sent by the origin and INSERT it. If there isn't enough information available to build a full row, skip the change. Can be used for update_missing and update_recently_deleted conflict types.
  • insert_or_error Try to build new row from available information sent by origin and insert it. If there isn't enough information available to build full row, throw an error and stop the replication. Can be used for update_missing and update_recently_deleted conflict types.
  • ignore Ignore any missing target column and continue processing. Can be used for the target_column_missing conflict type.
  • ignore_if_null Ignore a missing target column if the extra column in the remote row contains a NULL value. Otherwise, throw an error and stop replication. Can be used for the target_column_missing conflict type.
  • use_default_value Fill the missing column value with the default (including NULL if that's the column default) and continue processing. Any error while processing the default or violation of constraints (i.e., NULL default on NOT NULL column) stops replication. Can be used for the source_column_missing conflict type.

The insert_exists, update_differing, update_origin_change, update_missing, multiple_unique_conflicts, update_recently_deleted, update_pkey_exists, delete_recently_updated, and delete_missing` conflict types can also be resolved by user-defined logic using Conflict triggers.

This matrix helps you individuate the conflict types the conflict resolvers can handle.

insert_existsupdate_differingupdate_origin_changeupdate_missingupdate_recently_deletedupdate_pkey_existsdelete_recently_updateddelete_missingtarget_column_missingsource_column_missingtarget_table_missingmultiple_unique_conflicts
errorXXXXXXXXXXXX
skipXXXXXXXXXXXX
skip_if_recently_droppedX
update_if_newerXXXX
updateXXXXXX
insert_or_skipXX
insert_or_errorXX
ignoreX
ignore_if_nullX
use_default_valueX
conflict_triggerXXXXXXXXX

Default conflict resolvers

Conflict typeResolver
insert_existsupdate_if_newer
update_differingupdate_if_newer
update_origin_changeupdate_if_newer
update_missinginsert_or_skip
update_recently_deletedskip
update_pkey_existsupdate_if_newer
multiple_unique_conflictserror
delete_recently_updatedskip
delete_missingskip
target_column_missingignore_if_null
source_column_missinguse_default_value
target_table_missingskip_if_recently_dropped
apply_error_ddlerror

List of conflict resolutions

The conflict resolution represents the kind of resolution chosen by the conflict resolver and corresponds to the specific action that was taken to resolve the conflict.

The following conflict resolutions are currently supported for the conflict_resolution parameter:

  • apply_remote The remote (incoming) row was applied.
  • skip Processing of the row was skipped (no change was made locally).
  • merge A new row was created, merging information from remote and local row.
  • user User code (a conflict trigger) produced the row that was written to the target table.

Conflict logging

To ease the diagnosis and handling of multi-master conflicts, BDR, by default, logs every conflict into the bdr.conflict_history table. You can change this behavior with more granularity with the following functions.

bdr.alter_node_set_log_config

Set the conflict logging configuration for a node.

Synopsis

bdr.alter_node_set_log_config(node_name text,
                              log_to_file bool DEFAULT true,
                              log_to_table bool DEFAULT true,
                              conflict_type text[] DEFAULT NULL,
                              conflict_resolution text[] DEFAULT NULL)

Parameters

  • node_name Name of the node that's being changed.
  • log_to_file Whether to log to the node log file.
  • log_to_table Whether to log to the bdr.conflict_history table.
  • conflict_type Conflict types to log. NULL (the default) means all.
  • conflict_resolution Conflict resolutions to log. NULL (the default) means all.

Notes

Only the local node can be changed. The function call isn't replicated. If you want to change settings on multiple nodes, you must run the function on each of them.

This function is transactional. You can roll back the changes, and they are visible to the current transaction.

Listing conflict logging configurations

The view bdr.node_log_config shows all the logging configurations. It lists the name of the logging configuration, where it logs, and the conflict type and resolution it logs.

Logging conflicts to a table

Conflicts are logged to a table if log_to_table is set to true. The target table for conflict logging is bdr.conflict_history.

This table is range partitioned on the column local_time. The table is managed by Autopartition. By default, a new partition is created for every day, and conflicts of the last one month are maintained. After that, the old partitions are dropped automatically. Autopartition creates between 7 and 14 partitions in advance. bdr_superuser can change these defaults.

Since conflicts generated for all tables managed by BDR are logged to this table, it's important to ensure that only legitimate users can read the conflicted data. BDR does this by defining ROW LEVEL SECURITY policies on the bdr.conflict_history table. Only owners of the tables are allowed to read conflicts on the respective tables. If the underlying tables have RLS policies defined, enabled, and enforced, then even owners can't read the conflicts. RLS policies created with the FORCE option also apply to owners of the table. In that case, some or all rows in the underlying table might not be readable even to the owner. So BDR also enforces a stricter policy on the conflict log table.

The default role bdr_read_all_conflicts can be granted to users who need to see all conflict details logged to the bdr.conflict_history table without also granting them bdr_superuser role.

The default role bdr_read_all_stats has access to a catalog view called bdr.conflict_history_summary, which doesn't contain user data, allowing monitoring of any conflicts logged.

Conflict reporting

Conflicts logged to tables can be summarized in reports. Reports allow application owners to identify, understand, and resolve conflicts and introduce application changes to prevent them.

SELECT nspname, relname
, date_trunc('day', local_time) :: date AS date
, count(*)
FROM bdr.conflict_history
WHERE local_time > date_trunc('day', current_timestamp)
GROUP BY 1,2,3
ORDER BY 1,2;

 nspname | relname |    date    | count
---------+---------+------------+-------
 my_app  | test    | 2019-04-05 |     1
(1 row)

Data verification with LiveCompare

LiveCompare is a utility program designed to compare any two databases to verify that they are identical.

LiveCompare is included as part of the BDR stack and can be aimed at any pair of BDR nodes. By default, it compares all replicated tables and reports differences. LiveCompare also works with non-BDR data sources such as Postgres and Oracle.

You can also use LiveCompare to continuously monitor incoming rows. You can stop and start it without losing context information, so you can run it at convenient times.

LiveCompare allows concurrent checking of multiple tables. You can configure it to allow checking of a few tables or just a section of rows within a table. Checks are performed by first comparing whole row hashes. If different, LiveCompare then compares whole rows. LiveCompare avoids overheads by comparing rows in useful-sized batches.

If differences are found, they can be rechecked over a period, allowing for the delays of eventual consistency.

Refer to the LiveCompare documentation for further details.