Replication sets v3.7

Replication sets provide a mechanism to control which tables in the database will be replicated and which actions on those tables will be replicated.

Each replicated set can specify individually if INSERTs, UPDATEs, DELETEs and TRUNCATEs on the set are replicated. Every table can be in multiple replication sets and every subscriber can subscribe to multiple replication sets as well. The resulting set of tables and actions replicated is the union of the sets the table is in. The tables are not replicated until they are added into a replication set.

There are three preexisting replication sets, named "default", "default_insert_only" and "ddl_sql". The "default" replication set is defined to replicate all changes to tables in it. The "default_insert_only" replication set only replicates INSERTs and is meant for tables that don't have primary key (see Restrictions section for details). The "ddl_sql" replication set is defined to replicate schema changes specified by the pglogical.replicate_ddl_command.

Note: Table are not added automatically to the "default" replication set, the name "default" just means it exists by default. This behavior can be changed using pglogical.alter_replication_set.

Behavior of partitioned tables

From PostgreSQL 11 onwards, pglogical supports partitioned tables transparently. This means that a partitioned table can be added to a replication set and changes to any of the partitions will be replicated downstream.

The partitioning definition on the subscription side can be set up differently to the one on the provider. This means that one can also replicate a partitioned table to a single table, or a single table to a partitioned table, or a partitioned tabled to a differently'partitioned table (repartitioning).

It's also possible to add individual partitions to the replication set, in which case they will be replicated like regular tables (to the table of the same name as the partition on the downstream). This has some performance advantages in case the partitioning definition is same on both provider and subscriber, as the partitioning logic does not have to be executed.

Note: If the root-partitioned table is part of any replication set, memberships of individual partitions are ignored and only the membership of said root table will be taken into account.

Older versions of PostgreSQL

In PostgreSQL 10 and older, pglogical only allows the replication of partitions directly to other partitions. Which means the partitioned table itself cannot be added to a replication set and can't be target of replication on the subscriber either (one can't replicate a normal table to a partitioned table).

Replication set manipulation interfaces

The following functions are provided for managing the replication sets:

pglogical.create_replication_set

This function creates a new replication set.

Synopsis

pglogical.create_replication_set (
    set_name name,
    replicate_insert boolean,
    replicate_update boolean,
    replicate_delete boolean,
    replicate_truncate boolean,
    autoadd_tables boolean,
    autoadd_sequences boolean,
    autoadd_existing boolean
)

Parameters

  • set_name - name of the set, must be unique
  • replicate_insert - specifies if INSERT is replicated; default true
  • replicate_update - specifies if UPDATE is replicated; default true
  • replicate_delete - specifies if DELETE is replicated; default true
  • replicate_truncate - specifies if TRUNCATE is replicated; default true
  • autoadd_tables - specifies if newly created tables should be automatically added to the new replication set; default false
  • autoadd_sequences - specifies if newly created sequences should be automatically added to the new replication set; default false
  • autoadd_existing - this in combination with autoadd_tables or autoadd_sequences specifies if any existing tables and sequences should be added as well

The autoadd options will ignore tables that are in information_schema or pg_catalog schemas or are part of an extension.

The autoadd options will also allow automatic removal of tables from the replication set. So there will be no dependency check on replication membership when the table which is part of the autoadd replication set is being dropped.

If you want to replicate tables which are part of some extension, you still have to add them manually.

pglogical.alter_replication_set

This function changes the parameters of the existing replication set.

Synopsis

pglogical.alter_replication_set (
    set_name name,
    replicate_insert boolean,
    replicate_update boolean,
    replicate_delete boolean,
    replicate_truncate boolean,
    autoadd_tables boolean,
    autoadd_sequences boolean
)

Parameters

  • set_name - name of the existing replication set
  • replicate_insert - specifies if INSERT is replicated
  • replicate_update - specifies if UPDATE is replicated
  • replicate_delete - specifies if DELETE is replicated
  • replicate_truncate - specifies if TRUNCATE is replicated
  • autoadd_tables - specifies if newly created tables should be automatically added to the new replication set
  • autoadd_sequences - specifies if newly created sequences should be automatically added to the new replication set

If any of these replication set parameters is NULL (which is the default value if nothing else is specified), the current setting for that parameter will remain unchanged.

pglogical.drop_replication_set

Removes the replication set.

Synopsis

pglogical.drop_replication_set(set_name text)

Parameters

  • set_name - name of the existing replication set

pglogical.replication_set_add_table

Adds a table to a specified existing replication set, optionally requesting resynchronization by subscribers.

Synopsis

pglogical.replication_set_add_table (
    set_name name,
    relation regclass,
    synchronize_data boolean,
    columns text[],
    row_filter text
)

Parameters

  • set_name - name of the existing replication set
  • relation - name or OID of the table to be added to the set
  • synchronize_data - if true, the table data is synchronized on all subscribers which are subscribed to given replication set; default false
  • columns - list of columns to replicate. Normally when all columns should be replicated, this will be set to NULL which is the default.
  • row_filter - row filtering expression; default NULL (no filtering). See Row Filtering On Provider for more info.

WARNING: Use caution when synchronizing data with a valid row filter. Using synchronize_data=true with a valid row_filter is like a one-time operation for a table. Executing it again with a modified row_filter won't synchronize data to subscriber. Subscribers may need to call pglogical.alter_subscription_resynchronize_table() to fix it.

Also, note that if synchronize_data is enabled, a synchronization request is scheduled on each subscriber and actioned asynchronously. Adding to the replication set does not wait for synchronization to complete.

To wait until the resync has completed, first, on the provider, run:

SELECT pglogical.wait_slot_confirm_lsn(NULL, NULL);

To ensure each subscriber has received the request, then on each subscriber run:

SELECT pglogical.wait_for_subscription_sync_complete('sub_name');

NOTE: There is currently no function to alter the row filter or columns of a table's replication set membership (RM#5960). However, you can use a single transaction to remove the table from the replication set and then re-add it with the desired row filter and column filter. Make sure to set synchronize_data := false. This provides a seamless transition from the old to the new membership and will not skip or lose any rows from concurrent transactions.

pglogical.replication_set_add_all_tables

Adds all tables in given schemas.

Synopsis

pglogical.replication_set_add_all_tables (
    set_name name,
    schema_names text[],
    synchronize_data boolean
)

Only existing tables are added; any tables created later will not be added automatically. To see how to automatically add tables to the correct replication set at creation time, see Automatic assignment of replication sets for new tables.

Parameters

  • set_name - name of the existing replication set
  • schema_names - array of names name of existing schemas from which tables should be added
  • synchronize_data - if true, the table data is synchronized on all subscribers which are subscribed to the given replication set; default false

pglogical.replication_set_remove_table

Removes a table from a specified existing replication set.

Synopsis

pglogical.replication_set_remove_table(set_name name, relation regclass)

Parameters

  • set_name - name of the existing replication set
  • relation - name or OID of the table to be removed from the set

pglogical.replication_set_add_sequence

Adds a sequence to a replication set.

Synopsis

pglogical.replication_set_add_sequence (
    set_name name,
    relation regclass,
    synchronize_data boolean
)

Parameters

  • set_name - name of the existing replication set
  • relation - name or OID of the sequence to be added to the set
  • synchronize_data - if true, the sequence value will be synchronized immediately; default false

pglogical.replication_set_add_all_sequences

Adds all sequences from the given schemas.

Synopsis

pglogical.replication_set_add_all_sequences (
    set_name name,
    schema_names text[],
    synchronize_data boolean
)

Only existing sequences are added; any sequences created later will not be added automatically.

Parameters

  • set_name - name of the existing replication set
  • schema_names - array of names of existing schemas from which tables should be added
  • synchronize_data - if true, the sequence value will be synchronized immediately; default false

pglogical.replication_set_remove_sequence

Remove a sequence from a replication set.

Synopsis

pglogical.replication_set_remove_sequence(set_name name, relation regclass)

Parameters

  • set_name - name of the existing replication set
  • relation - name or OID of the sequence to be removed from the set

You can view the information about which table is in which set by querying the pglogical.tables view.

Automatic assignment of replication sets for new tables

The event trigger facility can be used for describing rules which define replication sets for newly created tables.

Example:

CREATE OR REPLACE FUNCTION pglogical_assign_repset()
RETURNS event_trigger AS $$
DECLARE obj record;
BEGIN
    FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands()
    LOOP
        IF obj.object_type = 'table' THEN
            IF obj.schema_name = 'config' THEN
                PERFORM pglogical.replication_set_add_table('configuration', obj.objid);
            ELSIF NOT obj.in_extension THEN
                PERFORM pglogical.replication_set_add_table('default', obj.objid);
            END IF;
        END IF;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

CREATE EVENT TRIGGER pglogical_assign_repset_trg
    ON ddl_command_end
    WHEN TAG IN ('CREATE TABLE', 'CREATE TABLE AS')
    EXECUTE PROCEDURE pglogical_assign_repset();

The above example will put all new tables created in schema config into replication set configuration and all other new tables which are not created by extensions will go to the default replication set.

Additional functions

pglogical.synchronize_sequence

Push sequence state to all subscribers.

Synopsis

pglogical.synchronize_sequence(relation regclass)

Unlike the subscription and table synchronization function, this function should be run on the provider. It forces an update of the tracked sequence state which will be consumed by all subscribers (replication set filtering still applies) once they replicate the transaction in which this function has been executed.

Parameters

  • relation - name of existing sequence, optionally qualified

Row Filtering on Provider

On the provider side, row filtering can be done by specifying the row_filter parameter for the pglogical.replication_set_add_table function. The row_filter is a normal PostgreSQL expression with the same limitations as a CHECK constraint.

You can see which row filters are active in the pglogical.tables view.

The table's column(s) are exposed to the row filter as simple identifiers; there's no qualifier or namespace.

Unlike a CHECK constraint's body, the row-filter is passed as a string which is parsed and checked by pglogical. So to avoid quoting issues you should use PostgreSQL's dollar-quoting, like this:

SELECT pglogical.replication_set_add_table(
'setname', 'tblname'::regclass,
synchronize_data := false,
row_filter := $FILTER$ id > 0 $FILTER$
);

A simple row_filter would look something like row_filter := 'id > 0' which would replicate only those rows where values of column id are greater than zero. This will not affect any already-committed rows pending replication, or any already-replicated rows.

Important: Caveats apply when re-synchronizing tables with row filters using replication_set_add_table. See pglogical.replication_set_add_table.

Writing safer row filters

Be very cautious when writing row filter expressions, and keep them as simple as possible. If a row-filter expression raises an error during replication, it is generally necessary to drop and re-create the subscription, resynchronizing all tables, not just the table with the problem row-filter. So row filters should be simple and defensively written. A non-exhaustive list of rules for writing filters is that they:

  • Should be simple expressions wherever possible. Try to use only built-in PostgreSQL operators and IMMUTABLE functions if you can.

  • Must avoid using any expression that could raise an ERROR at runtime, such as casting from text to a more strictly validated data type. They must tolerate any value that the table's constraints permit to appear in the table.

  • May use VOLATILE or STABLE functions, but any functions must obey the same constraints as the filter expression itself.

    E.g. you can call random() but not txid_current() or my_audit_log_function().

  • May call user-defined functions written in SQL, Pl/PgSQL, or (with care) C. Use of other languages is untested and not recommended. PL/PgSQL functions must not use EXCEPTION blocks, and may have other as-yet-undiscovered issues so their use is not recommended. Stick to SQL where possible.

  • Should not attempt to access any tables. Only the column values should be used.

    Direct use of subqueries in the row-filter expression is blocked.

    It's possible to call a user-defined function within the filter, and that can access table contents. This is not recommended and may be subject to surprising behaviour. The function must only access tables in pg_catalog.* or tables marked with the user_catalog_table=true attribute. Accessing other tables will not raise an error, but may cause undefined behaviour, errors, or crashes.

  • Must never attempt any write operation or anything that assigns a transaction-id. Similar to queries on a read-replica. Attempting writes will break replication.

  • May safely use columns of the filtered table that are not part of the replication set's column list. Filtering happens on the provider side so non-replicated columns will have their values accessible. This lets you do things like pre-compute complex filter criteria in triggers.

  • Should not rely on session state, since the row_filter is running inside the replication session. Session specific expressions such as CURRENT_USER will have values of the replication session and not the session which did the writes. The same is true for GUCs etc.

Changing row filters

To change a row-filter expression on a table, use a single transaction to remove the table from the replication set, then add it again with the new row filter expression. Do not specify data sync and make sure to explicitly repeat the set of replicated columns. You can check the pglogical.tables view for the old column set and row filter.

See pglogical.replication_set_add_table.