DDL replication v4

DDL stands for data definition language, the subset of the SQL language that creates, alters, and drops database objects.

For operational convenience and correctness, BDR replicates most DDL actions, with these exceptions:

  • Temporary or unlogged relations
  • Certain DDL statements (mostly long running)
  • Locking commands (LOCK)
  • Table maintenance commands (VACUUM, ANALYZE, CLUSTER, REINDEX)
  • Actions of autovacuum
  • Operational commands (CHECKPOINT, ALTER SYSTEM)
  • Actions related to databases or tablespaces

Automatic DDL replication makes certain DDL changes easier without having to manually distribute the DDL change to all nodes and ensure that they are consistent.

In the default replication set, DDL is replicated to all nodes by default. To replicate DDL, you must add a DDL replication filter to the replication set. See DDL replication filtering.

BDR is significantly different from standalone PostgreSQL when it comes to DDL replication. Treating it the same is the most common issue with BDR.

The main difference from table replication is that DDL replication doesn't replicate the result of the DDL but the statement itself. This works very well in most cases, although it introduces the requirement that the DDL must execute similarly on all nodes. A more subtle point is that the DDL must be immutable with respect to all datatype-specific parameter settings, including any datatypes introduced by extensions (not built-in). For example, the DDL statement must execute correctly in the default encoding used on each node.

DDL replication options

The bdr.ddl_replication parameter specifies replication behavior.

bdr.ddl_replication = on is the default and replicates DDL to the default replication set, which by default means all nodes. Nondefault replication sets don't replicate DDL unless they have a DDL filter defined for them.

You can also replicate DDL to specific replication sets using the function bdr.replicate_ddl_command(). This can be helpful if you want to run DDL commands when a node is down or if you want to have indexes or partitions that exist on a subset of nodes or rep sets, for example, all nodes at site1.

SELECT bdr.replicate_ddl_command(
				'CREATE INDEX CONCURRENTLY ON foo (col7);',
				ARRAY['site1'],     -- the replication sets
                'on');              -- ddl_locking to apply

While we don't recommend it, you can skip automatic DDL replication and execute it manually on each node using bdr.ddl_replication configuration parameters.

SET bdr.ddl_replication = off;

When set, it makes BDR skip both the global locking and the replication of executed DDL commands. You must then run the DDL manually on all nodes.

Warning

Executing DDL manually on each node without global locking can cause the whole BDR group to stop replicating if conflicting DDL or DML executes concurrently.

The bdr.ddl_replication parameter can be set only by the bdr_superuser, by superuser, or in the config file.

Executing DDL on BDR systems

A BDR group isn't the same as a standalone PostgreSQL server. It's based on asynchronous multi-master replication without central locking and without a transaction coordinator. This has important implications when executing DDL.

DDL that executes in parallel continues to do so with BDR. DDL execution respects the parameters that affect parallel operation on each node as it executes, so you might notice differences in the settings between nodes.

Prevent the execution of conflicting DDL, otherwise DDL replication causes errors and the replication stops.

BDR offers three levels of protection against those problems:

ddl_locking = 'dml' is the best option for operations, usable when you execute DDL from only one node at a time. This isn't the default, but we recommend that you use this setting if you can control where DDL is executed from. Doing so ensures that there are no inter-node conflicts. Intra-node conflicts are already handled by PostgreSQL.

ddl_locking = on is the strictest option and is best when DDL might execute from any node concurrently and you want to ensure correctness.

ddl_locking = off is the least strict option and is dangerous in general use. This option skips locks altogether, avoiding any performance overhead, which makes it a useful option when creating a new and empty database schema.

These options can be set only by the bdr_superuser, by the superuser, or in the config file.

When using the bdr.replicate_ddl_command, you can set this parameter directly with the third argument, using the specified bdr.ddl_locking setting only for the DDL commands passed to that function.

DDL locking details

Two kinds of locks enforce correctness of replicated DDL with BDR.

The first kind is known as a global DDL lock and is used only when ddl_locking = on. A global DDL lock prevents any other DDL from executing on the cluster while each DDL statement runs. This ensures full correctness in the general case but is too strict for many simple cases. BDR acquires a global lock on DDL operations the first time in a transaction where schema changes are made. This effectively serializes the DDL-executing transactions in the cluster. In other words, while DDL is running, no other connection on any node can run another DDL command, even if it affects different tables.

To acquire a lock on DDL operations, the BDR node executing DDL contacts the other nodes in a BDR group and asks them to grant it the exclusive right to execute DDL. The lock request is sent by the regular replication stream, and the nodes respond by the replication stream as well. So it's important that nodes (or at least a majority of the nodes) run without much replication delay. Otherwise it might take a long time for the node to acquire the DDL lock. Once the majority of nodes agrees, the DDL execution is carried out.

The ordering of DDL locking is decided using the Raft protocol. DDL statements executed on one node are executed in the same sequence on all other nodes.

To ensure that the node running a DDL has seen effects of all prior DDLs run in the cluster, it waits until it has caught up with the node that ran the previous DDL. If the node running the current DDL is lagging behind in replication with respect to the node that ran the previous DDL, then it might take a long time to acquire the lock. Hence it's preferable to run DDLs from a single node or the nodes that have nearly caught up with replication changes originating at other nodes.

The second kind is known as a relation DML lock. This kind of lock is used when either ddl_locking = on or ddl_locking = dml, and the DDL statement might cause in-flight DML statements to fail. These failures can occur when you add or modify a constraint such as a unique constraint, check constraint, or NOT NULL constraint. Relation DML locks affect only one relation at a time. Relation DML locks ensure that no DDL executes while there are changes in the queue that might cause replication to halt with an error.

To acquire the global DML lock on a table, the BDR node executing the DDL contacts all other nodes in a BDR group, asking them to lock the table against writes and waiting while all pending changes to that table are drained. Once all nodes are fully caught up, the originator of the DML lock is free to perform schema changes to the table and replicate them to the other nodes.

The global DML lock holds an EXCLUSIVE LOCK on the table on each node, so it blocks DML, other DDL, VACUUM, and index commands against that table while it runs. This is true even if the global DML lock is held for a command that normally doesn't take an EXCLUSIVE LOCK or higher.

Waiting for pending DML operations to drain can take a long time and even longer if replication is currently lagging. This means that schema changes affecting row representation and constraints, unlike with data changes, can be performed only while all configured nodes can be reached and are keeping up reasonably well with the current write rate. If such DDL commands must be performed while a node is down, first remove the down node from the configuration.

If a DDL statement isn't replicated, no global locks are acquired.

Locking behavior is specified by the bdr.ddl_locking parameter, as explained in Executing DDL on BDR systems:

  • ddl_locking = on takes global DDL lock and, if needed, takes relation DML lock.
  • ddl_locking = dml skips global DDL lock and, if needed, takes relation DML lock.
  • ddl_locking = off skips both global DDL lock and relation DML lock.

Some BDR functions make DDL changes. For those functions, DDL locking behavior applies. This is noted in the docs for each function.

Thus, ddl_locking = dml is safe only when you can guarantee that no conflicting DDL is executed from other nodes. With this setting, the statements that require only the global DDL lock don't use the global locking at all.

ddl_locking = off is safe only when you can guarantee that there are no conflicting DDL and no conflicting DML operations on the database objects DDL executes on. If you turn locking off and then experience difficulties, you might lose in-flight changes to data. The user application team needs to resolve any issues caused.

In some cases, concurrently executing DDL can properly be serialized. If these serialization failures occur, the DDL might reexecute.

DDL replication isn't active on logical standby nodes until they are promoted.

Some BDR management functions act like DDL, meaning that they attempt to take global locks, and their actions are replicated if DDL replication is active. The full list of replicated functions is listed in BDR functions that behave like DDL.

DDL executed on temporary tables never need global locks.

ALTER or DROP of an object created in the current transaction doesn't required global DML lock.

Monitoring of global DDL locks and global DML locks is shown in Monitoring.

Minimizing the impact of DDL

Good operational advice for any database, these points become even more important with BDR:

  • To minimize the impact of DDL, make transactions performing DDL short, don't combine them with lots of row changes, and avoid long running foreign key or other constraint rechecks.

  • For ALTER TABLE, use ADD CONSTRAINT NOT VALID followed by another transaction with VALIDATE CONSTRAINT rather than using ADD CONSTRAINT alone. VALIDATE CONSTRAINT waits until replayed on all nodes, which gives a noticeable delay to receive confirmations.

  • When indexing, use the CONCURRENTLY option whenever possible.

An alternate way of executing long-running DDL is to disable DDL replication and then to execute the DDL statement separately on each node. You can still do this using a single SQL statement, as shown in the following example. Global locking rules still apply, so be careful not to lock yourself out with this type of usage, which is more of a workaround.

SELECT bdr.run_on_all_nodes($ddl$
        CREATE INDEX CONCURRENTLY index_a ON table_a(i);
$ddl$);

We recommend using the bdr.run_on_all_nodes() technique with CREATE INDEX CONCURRENTLY, noting that DDL replication must be disabled for the whole session because CREATE INDEX CONCURRENTLY is a multi-transaction command. Avoid CREATE INDEX on production systems since it prevents writes while it executes. REINDEX is replicated in versions up to 3.6 but not with BDR 3.7 or later. Avoid using REINDEX because of the AccessExclusiveLocks it holds.

Instead, use REINDEX CONCURRENTLY (or reindexdb --concurrently), which is available in PG12+ or 2QPG11+.

You can disable DDL replication when using command-line utilities like this:

$ export PGOPTIONS="-c bdr.ddl_replication=off"
$ pg_restore --section=post-data

Multiple DDL statements might benefit from bunching into a single transaction rather than fired as individual statements, so take the DDL lock only once. This might not be desirable if the table-level locks interfere with normal operations.

If DDL is holding up the system for too long, you can safely cancel the DDL on the originating node with Control-C in psql or with pg_cancel_backend(). You can't cancel a DDL lock from any other node.

You can control how long the global lock takes with optional global locking timeout settings. bdr.global_lock_timeout limits how long the wait for acquiring the global lock can take before it's canceled. bdr.global_lock_statement_timeout limits the runtime length of any statement in transaction that holds global locks, and bdr.global_lock_idle_timeout sets the maximum allowed idle time (time between statements) for a transaction holding any global locks. You can disable all of these timeouts by setting their values to zero.

Once the DDL operation has committed on the originating node, you can't cancel or abort it. The BDR group must wait for it to apply successfully on other nodes that confirmed the global lock and for them to acknowledge replay. For this reason, keep DDL transactions short and fast.

Handling DDL with down nodes

If the node initiating the global DDL lock goes down after it acquired the global lock (either DDL or DML), the lock stays active. The global locks don't time out, even if timeouts were set. In case the node comes back up, it releases all the global locks that it holds.

If it stays down for a long time (or indefinitely), remove the node from the BDR group to release the global locks. This is one reason for executing emergency DDL using the SET command as the bdr_superuser to update the bdr.ddl_locking value.

If one of the other nodes goes down after it confirmed the global lock but before the command acquiring it executed, the execution of that command requesting the lock continues as if the node were up.

As mentioned earlier, the global DDL lock requires only a majority of the nodes to respond, and so it works if part of the cluster is down, as long as a majority is running and reachable. But the DML lock can't be acquired unless the whole cluster is available.

With global DDL or global DML lock, if another node goes down, the command continues normally, and the lock is released.

Statement-specific DDL replication concerns

Not all commands can be replicated automatically. Such commands are generally disallowed, unless DDL replication is turned off by turning bdr.ddl_replication off.

BDR prevents some DDL statements from running when it's active on a database. This protects the consistency of the system by disallowing statements that can't be replicated correctly or for which replication isn't yet supported.

If a statement isn't permitted under BDR, you can often find another way to do the same thing. For example, you can't do an ALTER TABLE, which adds a column with a volatile default value. But generally you can rephrase that as a series of independent ALTER TABLE and UPDATE statements that work.

Generally unsupported statements are prevented from being executed, raising a feature_not_supported (SQLSTATE 0A000) error.

Any DDL that references or relies on a temporary object can't be replicated by BDR and throws an error if executed with DDL replication enabled.

BDR DDL command handling matrix

The following table describes the utility or DDL commands that are allowed, the ones that are replicated, and the type of global lock they take when they're replicated.

For some more complex statements like ALTER TABLE, these can differ depending on the subcommands executed. Every such command has detailed explanation under the following table.

CommandAllowedReplicatedLock
ALTER AGGREGATEYYDDL
ALTER CASTYYDDL
ALTER COLLATIONYYDDL
ALTER CONVERSIONYYDDL
ALTER DATABASEYNN
ALTER DATABASE LINKYYDDL
ALTER DEFAULT PRIVILEGESYYDDL
ALTER DIRECTORYYYDDL
ALTER DOMAINYYDDL
ALTER EVENT TRIGGERYYDDL
ALTER EXTENSIONYYDDL
ALTER FOREIGN DATA WRAPPERYYDDL
ALTER FOREIGN TABLEYYDDL
ALTER FUNCTIONYYDDL
ALTER INDEXYYDDL
ALTER LANGUAGEYYDDL
ALTER LARGE OBJECTNNN
ALTER MATERIALIZED VIEWYNN
ALTER OPERATORYYDDL
ALTER OPERATOR CLASSYYDDL
ALTER OPERATOR FAMILYYYDDL
ALTER PACKAGEYYDDL
ALTER POLICYYYDDL
ALTER PROCEDUREYYDDL
ALTER PROFILEYYDDL
ALTER PUBLICATIONYYDDL
ALTER QUEUEYYDDL
ALTER QUEUE TABLEYYDDL
ALTER REDACTION POLICYYYDDL
ALTER RESOURCE GROUPYNN
ALTER ROLEYYDDL
ALTER ROUTINEYYDDL
ALTER RULEYYDDL
ALTER SCHEMAYYDDL
ALTER SEQUENCEDetailsYDML
ALTER SERVERYYDDL
ALTER SESSIONYNN
ALTER STATISTICSYYDDL
ALTER SUBSCRIPTIONYYDDL
ALTER SYNONYMYYDDL
ALTER SYSTEMYNN
ALTER TABLEDetailsYDetails
ALTER TABLESPACEYNN
ALTER TEXT SEARCH CONFIGURATIONYYDDL
ALTER TEXT SEARCH DICTIONARYYYDDL
ALTER TEXT SEARCH PARSERYYDDL
ALTER TEXT SEARCH TEMPLATEYYDDL
ALTER TRIGGERYYDDL
ALTER TYPEYYDDL
ALTER USER MAPPINGYYDDL
ALTER VIEWYYDDL
ANALYZEYNN
BEGINYNN
CHECKPOINTYNN
CLOSEYNN
CLOSE CURSORYNN
CLOSE CURSOR ALLYNN
CLUSTERYNN
COMMENTYDetailsDDL
COMMITYNN
COMMIT PREPAREDYNN
COPYYNN
COPY FROMYNN
CREATE ACCESS METHODYYDDL
CREATE AGGREGATEYYDDL
CREATE CASTYYDDL
CREATE COLLATIONYYDDL
CREATE CONSTRAINTYYDDL
CREATE CONVERSIONYYDDL
CREATE DATABASEYNN
CREATE DATABASE LINKYYDDL
CREATE DIRECTORYYYDDL
CREATE DOMAINYYDDL
CREATE EVENT TRIGGERYYDDL
CREATE EXTENSIONYYDDL
CREATE FOREIGN DATA WRAPPERYYDDL
CREATE FOREIGN TABLEYYDDL
CREATE FUNCTIONYYDDL
CREATE INDEXYYDML
CREATE LANGUAGEYYDDL
CREATE MATERIALIZED VIEWYNN
CREATE OPERATORYYDDL
CREATE OPERATOR CLASSYYDDL
CREATE OPERATOR FAMILYYYDDL
CREATE PACKAGEYYDDL
CREATE PACKAGE BODYYYDDL
CREATE POLICYYYDML
CREATE PROCEDUREYYDDL
CREATE PROFILEYYDDL
CREATE PUBLICATIONYYDDL
CREATE QUEUEYYDDL
CREATE QUEUE TABLEYYDDL
CREATE REDACTION POLICYYYDDL
CREATE RESOURCE GROUPYNN
CREATE ROLEYYDDL
CREATE ROUTINEYYDDL
CREATE RULEYYDDL
CREATE SCHEMAYYDDL
CREATE SEQUENCEDetailsYDDL
CREATE SERVERYYDDL
CREATE STATISTICSYYDDL
CREATE SUBSCRIPTIONYYDDL
CREATE SYNONYMYYDDL
CREATE TABLEDetailsYDDL
CREATE TABLE ASDetailsYDDL
CREATE TABLESPACEYNN
CREATE TEXT SEARCH CONFIGURATIONYYDDL
CREATE TEXT SEARCH DICTIONARYYYDDL
CREATE TEXT SEARCH PARSERYYDDL
CREATE TEXT SEARCH TEMPLATEYYDDL
CREATE TRANSFORMYYDDL
CREATE TRIGGERYYDDL
CREATE TYPEYYDDL
CREATE TYPE BODYYYDDL
CREATE USER MAPPINGYYDDL
CREATE VIEWYYDDL
DEALLOCATEYNN
DEALLOCATE ALLYNN
DECLARE CURSORYNN
DISCARDYNN
DISCARD ALLYNN
DISCARD PLANSYNN
DISCARD SEQUENCESYNN
DISCARD TEMPYNN
DOYNN
DROP ACCESS METHODYYDDL
DROP AGGREGATEYYDDL
DROP CASTYYDDL
DROP COLLATIONYYDDL
DROP CONSTRAINTYYDDL
DROP CONVERSIONYYDDL
DROP DATABASEYNN
DROP DATABASE LINKYYDDL
DROP DIRECTORYYYDDL
DROP DOMAINYYDDL
DROP EVENT TRIGGERYYDDL
DROP EXTENSIONYYDDL
DROP FOREIGN DATA WRAPPERYYDDL
DROP FOREIGN TABLEYYDDL
DROP FUNCTIONYYDDL
DROP INDEXYYDDL
DROP LANGUAGEYYDDL
DROP MATERIALIZED VIEWYNN
DROP OPERATORYYDDL
DROP OPERATOR CLASSYYDDL
DROP OPERATOR FAMILYYYDDL
DROP OWNEDYYDDL
DROP PACKAGEYYDDL
DROP PACKAGE BODYYYDDL
DROP POLICYYYDDL
DROP PROCEDUREYYDDL
DROP PROFILEYYDDL
DROP PUBLICATIONYYDDL
DROP QUEUEYYDDL
DROP QUEUE TABLEYYDDL
DROP REDACTION POLICYYYDDL
DROP RESOURCE GROUPYNN
DROP ROLEYYDDL
DROP ROUTINEYYDDL
DROP RULEYYDDL
DROP SCHEMAYYDDL
DROP SEQUENCEYYDDL
DROP SERVERYYDDL
DROP STATISTICSYYDDL
DROP SUBSCRIPTIONYYDDL
DROP SYNONYMYYDDL
DROP TABLEYYDML
DROP TABLESPACEYNN
DROP TEXT SEARCH CONFIGURATIONYYDDL
DROP TEXT SEARCH DICTIONARYYYDDL
DROP TEXT SEARCH PARSERYYDDL
DROP TEXT SEARCH TEMPLATEYYDDL
DROP TRANSFORMYYDDL
DROP TRIGGERYYDDL
DROP TYPEYYDDL
DROP TYPE BODYYYDDL
DROP USER MAPPINGYYDDL
DROP VIEWYYDDL
EXECUTEYNN
EXPLAINYDetailsDetails
FETCHYNN
GRANTYDetailsDDL
GRANT ROLEYYDDL
IMPORT FOREIGN SCHEMAYYDDL
LISTENYNN
LOADYNN
LOAD ROW DATAYYDDL
LOCK TABLEYNDetails
MOVEYNN
NOTIFYYNN
PREPAREYNN
PREPARE TRANSACTIONYNN
REASSIGN OWNEDYYDDL
REFRESH MATERIALIZED VIEWYNN
REINDEXYNN
RELEASEYNN
RESETYNN
REVOKEYDetailsDDL
REVOKE ROLEYYDDL
ROLLBACKYNN
ROLLBACK PREPAREDYNN
SAVEPOINTYNN
SECURITY LABELYDetailsDDL
SELECT INTODetailsYDDL
SETYNN
SET CONSTRAINTSYNN
SHOWYNN
START TRANSACTIONYNN
TRUNCATE TABLEYDetailsDetails
UNLISTENYNN
VACUUMYNN

ALTER SEQUENCE

Generally ALTER SEQUENCE is supported, but when using global sequences, some options have no effect.

ALTER SEQUENCE ... RENAME isn't supported on galloc sequences (only). ALTER SEQUENCE ... SET SCHEMA isn't supported on galloc sequences (only).

ALTER TABLE

Generally, ALTER TABLE commands are allowed. However, several subcommands aren't supported.

ALTER TABLE disallowed commands

Some variants of ALTER TABLE currently aren't allowed on a BDR node:

  • ADD COLUMN ... DEFAULT (non-immutable expression) This is not allowed because it currently results in different data on different nodes. See Adding a column for a suggested workaround.
  • ADD CONSTRAINT ... EXCLUDE Exclusion constraints aren't supported for now. Exclusion constraints don't make much sense in an asynchronous system and lead to changes that can't be replayed.
  • ALTER TABLE ... SET WITH[OUT] OIDS Isn't supported for the same reasons as in CREATE TABLE.
  • ALTER COLUMN ... SET STORAGE external Is rejected if the column is one of the columns of the replica identity for the table.
  • RENAME Can't rename an Autopartitioned table.
  • SET SCHEMA Can't set the schema of an Autopartitioned table.
  • ALTER COLUMN ... TYPE Changing a column's type isn't supported if the command causes the whole table to be rewritten, which occurs when the change isn't binary coercible. Binary coercible changes might be allowed only one way. For example, the change from VARCHAR(128) to VARCHAR(256) is binary coercible and therefore allowed, whereas the change VARCHAR(256) to VARCHAR(128) isn't binary coercible and therefore normally disallowed. Nonreplicated ALTER COLUMN ... TYPE, can be allowed if the column is automatically castable to the new type (it doesn't contain the USING clause). An example follows. Table rewrites hold an AccessExclusiveLock for extended periods on larger tables, so such commands are likely to be infeasible on highly available databases in any case. See Changing a column's type for a suggested workaround.
  • ALTER TABLE ... ADD FOREIGN KEY Isn't supported if current user doesn't have permission to read the referenced table or if the referenced table has RLS restrictions enabled that the current user can't bypass.

The following example fails because it tries to add a constant value of type timestamp onto a column of type timestamptz. The cast between timestamp and timestamptz relies on the time zone of the session and so isn't immutable.

ALTER TABLE foo
  ADD expiry_date timestamptz DEFAULT timestamp '2100-01-01 00:00:00' NOT NULL;

Starting in BDR 3.7.4, you can add certain types of constraints, such as CHECK and FOREIGN KEY constraints, without taking a DML lock. But this requires a two-step process of first creating a NOT VALID constraint and then validating the constraint in a separate transaction with the ALTER TABLE ... VALIDATE CONSTRAINT command. See Adding a CONSTRAINT for more details.

ALTER TABLE locking

The following variants of ALTER TABLE take only DDL lock and not a DML lock:

  • ALTER TABLE ... ADD COLUMN ... (immutable) DEFAULT
  • ALTER TABLE ... ALTER COLUMN ... SET DEFAULT expression
  • ALTER TABLE ... ALTER COLUMN ... DROP DEFAULT
  • ALTER TABLE ... ALTER COLUMN ... TYPE if it doesn't require rewrite
  • ALTER TABLE ... ALTER COLUMN ... SET STATISTICS
  • ALTER TABLE ... VALIDATE CONSTRAINT
  • ALTER TABLE ... ATTACH PARTITION
  • ALTER TABLE ... DETACH PARTITION
  • ALTER TABLE ... ENABLE TRIGGER (ENABLE REPLICA TRIGGER still takes a DML lock)
  • ALTER TABLE ... CLUSTER ON
  • ALTER TABLE ... SET WITHOUT CLUSTER
  • ALTER TABLE ... SET ( storage_parameter = value [, ... ] )
  • ALTER TABLE ... RESET ( storage_parameter = [, ... ] )
  • ALTER TABLE ... OWNER TO

All other variants of ALTER TABLE take a DML lock on the table being modified. Some variants of ALTER TABLE have restrictions, noted below.

ALTER TABLE examples

This next example works because the type change is binary coercible and so doesn't cause a table rewrite. It executes as a catalog-only change.

CREATE TABLE foo (id BIGINT PRIMARY KEY, description VARCHAR(20));
ALTER TABLE foo ALTER COLUMN description TYPE VARCHAR(128);

However, making this change to reverse the command isn't possible because the change from VARCHAR(128) to VARCHAR(20) isn't binary coercible.

ALTER TABLE foo ALTER COLUMN description TYPE VARCHAR(20);

For workarounds, see Restricted DDL workarounds.

It's useful to provide context for different types of ALTER TABLE ... ALTER COLUMN TYPE (ATCT) operations that are possible in general and in nonreplicated environments.

Some ATCT operations update only the metadata of the underlying column type and don't require a rewrite of the underlying table data. This is typically the case when the existing column type and the target type are binary coercible. For example:

CREATE TABLE sample (col1 BIGINT PRIMARY KEY, col2 VARCHAR(128), col3 INT);
ALTER TABLE sample ALTER COLUMN col2 TYPE VARCHAR(256);

You can also change the column type to VARCHAR or TEXT data types because of binary coercibility. Again, this is just a metadata update of the underlying column type.

ALTER TABLE sample ALTER COLUMN col2 TYPE VARCHAR;
ALTER TABLE sample ALTER COLUMN col2 TYPE TEXT;

However, if you want to reduce the size of col2, then that leads to a rewrite of the underlying table data. Rewrite of a table is normally restricted.

ALTER TABLE sample ALTER COLUMN col2 TYPE VARCHAR(64);
ERROR:  ALTER TABLE ... ALTER COLUMN TYPE that rewrites table data may not affect replicated tables on a BDR node

To give an example with nontext types, consider col3 above with type INTEGER. An ATCT operation that tries to convert to SMALLINT or BIGINT fails in a similar manner as above.

ALTER TABLE sample ALTER COLUMN col3 TYPE bigint;
ERROR:  ALTER TABLE ... ALTER COLUMN TYPE that rewrites table data may not affect replicated tables on a BDR node

In both of these failing cases, there's an automatic assignment cast from the current types to the target types. However, there's no binary coercibility, which ends up causing a rewrite of the underlying table data.

In such cases, in controlled DBA environments, you can change the type of a column to an automatically castable one by adopting a rolling upgrade for the type of this column in a nonreplicated environment on all the nodes, one by one. Suppose the DDL isn't replicated and the change of the column type is to an automatically castable one. You can then allow the rewrite locally on the node performing the alter, along with concurrent activity on other nodes on this same table. You can then repeat this nonreplicated ATCT operation on all the nodes one by one to bring about the desired change of the column type across the entire EDB Postgres Distributed cluster. Because this involves a rewrite, the activity still takes the DML lock for a brief period and thus requires that the whole cluster is available. With these specifics in place, you can carry out the rolling upgrade of the nonreplicated alter activity like this:

-- foreach node in EDB Postgres Distributed cluster do:
SET bdr.ddl_replication TO FALSE;
ALTER TABLE sample ALTER COLUMN col2 TYPE VARCHAR(64);
ALTER TABLE sample ALTER COLUMN col3 TYPE BIGINT;
RESET bdr.ddl_replication;
-- done

Due to automatic assignment casts being available for many data types, this local nonreplicated ATCT operation supports a wide variety of conversions. Also, ATCT operations that use a USING clause are likely to fail because of the lack of automatic assignment casts. This example shows a few common conversions with automatic assignment casts:

-- foreach node in EDB Postgres Distributed cluster do:
SET bdr.ddl_replication TO FALSE;
ATCT operations to-from {INTEGER, SMALLINT, BIGINT}
ATCT operations to-from {CHAR(n), VARCHAR(n), VARCHAR, TEXT}
ATCT operations from numeric types to text types
RESET bdr.ddl_replication;
-- done

This example isn't an exhaustive list of possibly allowable ATCT operations in a nonreplicated environment. Not all ATCT operations work. The cases where no automatic assignment is possible fail even if you disable DDL replication. So, while conversion from numeric types to text types works in a nonreplicated environment, conversion back from text type to numeric types fails.

SET bdr.ddl_replication TO FALSE;
-- conversion from BIGINT to TEXT works
ALTER TABLE sample ALTER COLUMN col3 TYPE TEXT;
-- conversion from TEXT back to BIGINT fails
ALTER TABLE sample ALTER COLUMN col3 TYPE BIGINT;
ERROR:  ALTER TABLE ... ALTER COLUMN TYPE which cannot be automatically cast to new type may not affect replicated tables on a BDR node
RESET bdr.ddl_replication;

While the ATCT operations in nonreplicated environments support a variety of type conversions, the rewrite can still fail if the underlying table data contains values that you can't assign to the new data type. For example, suppose the current type for a column is VARCHAR(256) and you try a nonreplicated ATCT operation to convert it into VARCHAR(128). If there's any existing data in the table that's wider than 128 bytes, then the rewrite operation fails locally.

INSERT INTO sample VALUES (1, repeat('a', 200), 10);
SET bdr.ddl_replication TO FALSE;
ALTER TABLE sample ALTER COLUMN col2 TYPE VARCHAR(128);
INFO:  in rewrite
ERROR:  value too long for type character varying(128)

If underlying table data meets the characteristics of the new type, then the rewrite succeeds. However, replication might fail if other nodes that haven't yet performed the nonreplicated rolling data type upgrade introduce new data that is wider than 128 bytes concurrently to this local ATCT operation. This brings replication to a halt in the cluster. So be aware of the data type restrictions and characteristics at the database and application levels while performing these nonreplicated rolling data type upgrade operations. We strongly recommend that you perform and test such ATCT operations in controlled and fully aware DBA environments. These ATCT operations are asymmetric, and backing out certain changes that fail can lead to table rewrites that take a long time.

Also, you can't perform the implicit castable ALTER activity in transaction blocks.

ALTER TYPE

ALTER TYPE is replicated, but a global DML lock isn't applied to all tables that use that data type, since PostgreSQL doesn't record those dependencies. See Restricted DDL workarounds.

COMMENT ON

All variants of COMMENT ON are allowed, but COMMENT ON TABLESPACE/DATABASE/LARGE OBJECT isn't replicated.

CREATE SEQUENCE

Generally CREATE SEQUENCE is supported, but when using global sequences, some options have no effect.

CREATE TABLE

Generally CREATE TABLE is supported, but CREATE TABLE WITH OIDS isn't allowed on a BDR node.

CREATE TABLE AS and SELECT INTO

CREATE TABLE AS and SELECT INTO are allowed only if all subcommands are also allowed.

EXPLAIN

Generally EXPLAIN is allowed, but because EXPLAIN ANALYZE can have side effects on the database, there are some restrictions on it.

EXPLAIN ANALYZE Replication

EXPLAIN ANALYZE follows replication rules of the analyzed statement.

EXPLAIN ANALYZE Locking

EXPLAIN ANALYZE follows locking rules of the analyzed statement.

GRANT and REVOKE

Generally GRANT and REVOKE statements are supported, however GRANT/REVOKE ON TABLESPACE/LARGE OBJECT aren't replicated.

LOCK TABLE

LOCK TABLE isn't replicated, but it might acquire the global DML lock when bdr.lock_table_locking is set on.

You can also use The bdr.global_lock_table() function to explicitly request a global DML lock.

SECURITY LABEL

All variants of SECURITY LABEL are allowed, but SECURITY LABEL ON TABLESPACE/DATABASE/LARGE OBJECT isn't replicated.

TRUNCATE Replication

TRUNCATE command is replicated as DML, not as a DDL statement. Whether the TRUNCATE on table is replicated depends on replication settings for each affected table.

TRUNCATE Locking

Even though TRUNCATE isn't replicated the same way as other DDL, it can acquire the global DML lock when bdr.truncate_locking is set to on.

Role manipulation statements

Users are global objects in a PostgreSQL instance, which means they span multiple databases while BDR operates on an individual database level. This means that role manipulation statement handling needs extra thought.

BDR requires that any roles that are referenced by any replicated DDL must exist on all nodes. The roles don't have to have the same grants, password, and so on, but they must exist.

BDR replicates role manipulation statements if bdr.role_replication is enabled (default) and role manipulation statements are run in a BDR-enabled database.

The role manipulation statements include the following:

  • CREATE ROLE
  • ALTER ROLE
  • DROP ROLE
  • GRANT ROLE
  • CREATE USER
  • ALTER USER
  • DROP USER
  • CREATE GROUP
  • ALTER GROUP
  • DROP GROUP

In general, either:

  • Configure the system with bdr.role_replication = off and deploy all role changes (user and group) by external orchestration tools like Ansible, Puppet, and Chef or explicitly replicated by bdr.replicate_ddl_command(...).

  • Configure the system so that exactly one BDR-enabled database on the PostgreSQL instance has bdr.role_replication = on and run all role management DDL on that database.

We recommended that you run all role management commands in one database.

If role replication is turned off, then the administrator must ensure that any roles used by DDL on one node also exist on the other nodes. Otherwise BDR apply stalls with an error until the role is created on the other nodes.

BDR doesn't capture and replicate role management statements when they run on a non-BDR-enabled database in a BDR-enabled PostgreSQL instance. For example, if you have DBs 'bdrdb' (bdr group member) and 'postgres' (bare db), and bdr.role_replication = on, then a CREATE USER run in bdrdb is replicated, but a CREATE USER run in postgres isn't.

Restricted DDL workarounds

Some of the limitations of BDR DDL operation handling can be worked around. Often splitting up the operation into smaller changes can produce the desired result that either isn't allowed as a single statement or requires excessive locking.

Adding a CONSTRAINT

You can add CHECK and FOREIGN KEY constraints without requiring a DML lock. This involves a two-step process.

  • ALTER TABLE ... ADD CONSTRAINT ... NOT VALID
  • ALTER TABLE ... VALIDATE CONSTRAINT

Execute these steps in two different transactions. Both these steps take DDL lock only on the table and hence can be run even when one or more nodes are down. But to validate a constraint, BDR must ensure that:

  • All nodes in the cluster see the ADD CONSTRAINT command.
  • The node validating the constraint applied replication changes from all other nodes prior to creating the NOT VALID constraint on those nodes.

So even though the new mechanism doesn't need all nodes to be up while validating the constraint, it still requires that all nodes applied the ALTER TABLE .. ADD CONSTRAINT ... NOT VALID command and made enough progress. BDR waits for a consistent state to be reached before validating the constraint.

The new facility requires the cluster to run with Raft protocol version 24 and beyond. If the Raft protocol isn't yet upgraded, the old mechanism is used, resulting in a DML lock request.

Adding a column

To add a column with a volatile default, run these commands in separate transactions:

    ALTER TABLE mytable ADD COLUMN newcolumn coltype; -- Note the lack of DEFAULT or NOT NULL

    ALTER TABLE mytable ALTER COLUMN newcolumn DEFAULT volatile-expression;

	BEGIN;
	SELECT bdr.global_lock_table('mytable');
    UPDATE mytable SET newcolumn = default-expression;
	COMMIT;

This approach splits schema changes and row changes into separate transactions that BDR can execute and results in consistent data across all nodes in a BDR group.

For best results, batch the update into chunks so that you don't update more than a few tens or hundreds of thousands of rows at once. You can do this using a PROCEDURE with embedded transactions.

The last batch of changes must run in a transaction that takes a global DML lock on the table. Otherwise you can miss rows that are inserted concurrently into the table on other nodes.

If required, you can run ALTER TABLE mytable ALTER COLUMN newcolumn NOT NULL; after the UPDATE has finished.

Changing a column's type

PostgreSQL causes a table rewrite in some cases where it could be avoided, for example:

CREATE TABLE foo (id BIGINT PRIMARY KEY, description VARCHAR(128));
ALTER TABLE foo ALTER COLUMN description TYPE VARCHAR(20);

You can rewrite this statement to avoid a table rewrite by making the restriction a table constraint rather than a datatype change. The constraint can then be validated in a subsequent command to avoid long locks, if you want.

CREATE TABLE foo (id BIGINT PRIMARY KEY, description VARCHAR(128));
ALTER TABLE foo
  ALTER COLUMN description TYPE varchar,
  ADD CONSTRAINT description_length_limit CHECK (length(description) <= 20) NOT VALID;
ALTER TABLE foo VALIDATE CONSTRAINT description_length_limit;

If the validation fails, then you can UPDATE just the failing rows. You can use this technique for TEXT and VARCHAR using length() or with NUMERIC datatype using scale().

In the general case for changing column type, first add a column of the desired type:

ALTER TABLE mytable ADD COLUMN newcolumn newtype;

Create a trigger defined as BEFORE INSERT OR UPDATE ON mytable FOR EACH ROW .., which assigns NEW.newcolumn to NEW.oldcolumn so that new writes to the table update the new column automatically.

UPDATE the table in batches to copy the value of oldcolumn to newcolumn using a PROCEDURE with embedded transactions. Batching the work helps reduce replication lag if it's a big table. Updating by range of IDs or whatever method you prefer is fine. Alternatively, you can update the whole table in one pass for smaller tables.

CREATE INDEX ... any required indexes on the new column. It's safe to use CREATE INDEX ... CONCURRENTLY run individually without DDL replication on each node to reduce lock durations.

ALTER the column to add a NOT NULL and CHECK constraints, if required.

  1. BEGIN a transaction.
  2. DROP the trigger you added.
  3. ALTER TABLE to add any DEFAULT required on the column.
  4. DROP the old column.
  5. ALTER TABLE mytable RENAME COLUMN newcolumn TO oldcolumn.
  6. COMMIT.

Because you're dropping a column, you might have to re-create views, procedures, and so on that depend on the table. Be careful if you CASCADE drop the column, as you must be sure you re-create everything that referred to it.

Changing other types

The ALTER TYPE statement is replicated, but affected tables aren't locked.

When this DDL is used, ensure that the statement has successfully executed on all nodes before using the new type. You can achieve this using the bdr.wait_slot_confirm_lsn() function.

This example ensures that the DDL is written to all nodes before using the new value in DML statements:

ALTER TYPE contact_method ADD VALUE 'email';
SELECT bdr.wait_slot_confirm_lsn(NULL, NULL);

BDR functions that behave like DDL

The following BDR management functions act like DDL. This means that, if DDL replication is active and DDL filter settings allow it, they attempt to take global locks and their actions are replicate. For detailed information, see the documentation for the individual functions.

Replication set management

  • bdr.create_replication_set
  • bdr.alter_replication_set
  • bdr.drop_replication_set
  • bdr.replication_set_add_table
  • bdr.replication_set_remove_table
  • bdr.replication_set_add_ddl_filter
  • bdr.replication_set_remove_ddl_filter

Conflict management

  • bdr.alter_table_conflict_detection
  • bdr.column_timestamps_enable
  • bdr.column_timestamps_disable

Sequence management

  • bdr.alter_sequence_set_kind

Stream triggers

  • bdr.create_conflict_trigger
  • bdr.create_transform_trigger
  • bdr.drop_trigger