Perform PGD management primarily by using functions you call from SQL.
All functions in PGD are exposed in the bdr schema. Schema qualify any calls to these
functions instead of putting bdr in the
search_path.
Version information functions
bdr.bdr_version
This function retrieves the textual representation of the version of the BDR extension currently in use.
bdr.bdr_version_num
This function retrieves the version number of the BDR extension that is currently in use. Version numbers are monotonically increasing, allowing this value to be used for less-than and greater-than comparisons.
The following formula returns the version number consisting of major version, minor version, and patch release into a single numerical value:
MAJOR_VERSION * 10000 + MINOR_VERSION * 100 + PATCH_RELEASE
System information functions
bdr.get_relation_stats
Returns the relation information.
bdr.get_subscription_stats
Returns the current subscription statistics.
System and progress information parameters
PGD exposes some parameters that you can query directly in SQL using, for example,
SHOW or the current_setting() function. You can also use PQparameterStatus
(or equivalent) from a client application.
bdr.local_node_id
When you initialize a session, this is set to the node id the client is connected to. This allows an application to figure out the node it's connected to, even behind a transparent proxy.
It's also used with Connection pools and proxies.
bdr.last_committed_lsn
After every COMMIT of an asynchronous transaction, this parameter is updated to
point to the end of the commit record on the origin node. Combining it with bdr.wait_for_apply_queue,
allows applications
to perform causal reads across multiple nodes, that is, to wait until a transaction
becomes remotely visible.
transaction_id
If a CAMO transaction is in progress, transaction_id is updated to show
the assigned transaction id. You can query this parameter only by using
using PQparameterStatus or equivalent, and it isn't accessible in SQL.
See Application use
for a usage example.
Node status functions
bdr.is_node_connected
Synopsis
bdr.is_node_connected(node_name name)
Returns boolean by checking if the walsender for a given peer is active on this node.
bdr.is_node_ready
Synopsis
bdr.is_node_ready(node_name name, span interval DEFAULT NULL)
Returns boolean by checking if the lag is lower than the given span or
lower than the timeout for TO ASYNC otherwise.
Consensus function
bdr.consensus_disable
Synopsis
bdr.consensus_disable(node_group_name text DEFAULT NULL)
Disables the consensus worker on the local node until server restart or until
it's reenabled using bdr.consensus_enable (whichever happens first).
The node_group_name parameter is accepted for interface consistency but is
currently a placeholder. The function disables the consensus worker
database-wide, affecting all Raft instances on the node regardless of the value
passed.
Warning
Disabling consensus disables some features of PGD and affects availability of the EDB Postgres Distributed cluster if left disabled for a long time. Use this function only when working with Technical Support.
bdr.consensus_enable
Synopsis
bdr.consensus_enable(node_group_name text DEFAULT NULL)
Re-enables a disabled consensus worker on the local node.
The node_group_name parameter is accepted for interface consistency but is
currently a placeholder. The function re-enables the consensus worker
database-wide, affecting all Raft instances on the node regardless of the value
passed.
bdr.consensus_proto_version
Returns currently used consensus protocol version by the local node.
Needed by the PGD group reconfiguration internal mechanisms.
bdr.consensus_snapshot_export
Synopsis
bdr.consensus_snapshot_export(version integer DEFAULT NULL, raft_instance_id integer DEFAULT NULL)
Generate a new PGD consensus snapshot from the currently committed-and-applied state of the local node and return it as bytea.
By default, a snapshot for the highest supported Raft version is
exported. But you can override that by passing an explicit version
number.
The raft_instance_id parameter identifies which Raft instance to export the
snapshot from. When NULL (the default), the top-level Raft instance is used.
To target a specific node group's Raft instance, pass its instance ID, which
you can look up via bdr.raft_instances.
The exporting node doesn't have to be the current Raft leader, and it doesn't
need to be completely up to date with the latest state on the leader. However, bdr.consensus_snapshot_import()
might not accept such a snapshot.
The new snapshot isn't automatically stored to the local node's
bdr.local_consensus_snapshot table. It's only returned to the caller.
The generated snapshot might be passed to bdr.consensus_snapshot_import() on
any other nodes in the same PGD node group that's behind the exporting node's
Raft log position.
The local PGD consensus worker must be disabled for this function to work. Typical usage is:
SELECT bdr.bdr_consensus_disable(); COPY (SELECT * FROM bdr.consensus_snapshot_export()) TO '/var/lib/postgresql/my_node_consensus_snapshot.data'; SELECT bdr.bdr_consensus_enable();
While the PGD consensus worker is disabled:
- DDL locking attempts on the node fail or time out.
- galloc sequences don't get new values.
- Eager and CAMO transactions pause or error.
- Other functionality that needs the distributed consensus system is disrupted. The required downtime is generally very brief.
Depending on the use case, it might be practical to extract a snapshot that
already exists from the snapshot field of the bdr.local_consensus_snapshot
table and use that instead. Doing so doesn't require you to stop the consensus worker.
bdr.consensus_snapshot_import
Synopsis
bdr.consensus_snapshot_import(snapshot bytea, raft_instance_id integer DEFAULT NULL, node_name bdr.regnode DEFAULT NULL)
Import a consensus snapshot that was exported by
bdr.consensus_snapshot_export(), usually from another node in the same PGD
node group.
It's also possible to use a snapshot extracted directly from the snapshot
field of the bdr.local_consensus_snapshot table on another node.
This function is useful for resetting a PGD node's catalog state to a known good state in case of corruption or user error.
The raft_instance_id parameter identifies which Raft instance to import the
snapshot into. When NULL (the default), the top-level Raft instance is used.
To target a specific node group's Raft instance, pass its instance ID, which
you can look up via bdr.raft_instances.
The node_name parameter attributes the snapshot to a specific source node,
typically the leader node from which the snapshot was exported. When NULL
(the default), no source node is attributed.
You can import the snapshot if the importing node's apply_index is less than
or equal to the snapshot-exporting node's commit_index when the
snapshot was generated. (See bdr.get_raft_status().) A node that can't accept
the snapshot because its log is already too far ahead raises an error
and makes no changes. The imported snapshot doesn't have to be completely
up to date, as once the snapshot is imported the node fetches the remaining
changes from the current leader.
The PGD consensus worker must be disabled on the importing node for this
function to work. See notes on bdr.consensus_snapshot_export() for details.
It's possible to use this function to force the local node to generate a new Raft snapshot by running:
SELECT bdr.consensus_snapshot_import(bdr.consensus_snapshot_export());
This approach might also truncate the Raft logs up to the current applied log position.
bdr.consensus_snapshot_verify
Synopsis
bdr.consensus_snapshot_verify(snapshot bytea)
Verify the given consensus snapshot that was exported by
bdr.consensus_snapshot_export(). The snapshot header contains the
version with which it was generated and the node tries to verify it
against the same version.
The snapshot might have been exported on the same node or any other node in the cluster. If the node verifying the snapshot doesn't support the version of the exported snapshot, then an error is raised.
bdr.get_consensus_status
Returns status information about the current consensus (Raft) worker.
bdr.get_raft_status
Returns status information about the current consensus (Raft) worker.
Alias for bdr.get_consensus_status.
bdr.raft_leadership_transfer
Synopsis
bdr.raft_leadership_transfer(node_name text, wait_for_completion boolean, node_group_name text DEFAULT NULL)
Request the node identified by node_name to be the Raft leader. The
request can be initiated from any of the PGD nodes and is
internally forwarded to the current leader to transfer the leadership to
the designated node. The designated node must be an ACTIVE PGD node
with full voting rights.
If wait_for_completion is false, the request is served on
a best-effort basis. If the node can't become a leader in the
bdr.raft_global_lection_timeout period, then some other capable node
becomes the leader again. Also, the leadership can change over the
period of time per Raft protocol. A true return result indicates
only that the request was submitted successfully.
If wait_for_completion is true, then the function waits until
the given node becomes the new leader and possibly waits infinitely if
the requested node fails to become Raft leader (for example, due to network
issues). We therefore recommend that you always set a statement_timeout
with wait_for_completion to prevent an infinite loop.
The node_group_name is optional and can be used to specify the name of the node group where the
leadership transfer happens. If not specified, it defaults to NULL, which
is interpreted as the top-level group in the cluster. If the node_group_name is
specified, the function transfers leadership only within the specified node
group.
Utility functions
bdr.alter_node_kind
PGD5 introduced a concept of Task Manager Leader node. The node is selected by PGD, but for upgraded clusters, it's important to set the node_kind properly for all nodes in the cluster. Do this manually after upgrading to the latest PGD version by calling the bdr.alter_node_kind() SQL function for each node.
Synopsis
bdr.alter_node_kind(node_name text,
node_kind text);Parameters
| Parameter | Description |
|---|---|
node_name | Name of the node to change kind. |
node_kind | Kind of the node. |
bdr.alter_subscription_skip_changes_upto
Because logical replication can replicate across versions, doesn't replicate global changes like roles, and can replicate selectively, sometimes the logical replication apply process can encounter an error and stop applying changes.
Wherever possible, fix such problems by making changes to the target side. CREATE any missing table that's blocking replication, CREATE a needed role, GRANT a necessary permission, and so on. But occasionally a problem can't be fixed that way and it might be necessary to skip entirely over a transaction. Changes are skipped as entire transactions—all or nothing. To decide where to skip to, use log output to find the commit LSN, per the example that follows, or peek the change stream with the logical decoding functions.
Unless a transaction made only one change, you often need to manually apply the transaction's effects on the target side, so it's important to save the problem transaction whenever possible, as shown in the examples that follow.
It's possible to skip over changes without bdr.alter_subscription_skip_changes_upto by using pg_catalog.pg_logical_slot_get_binary_changes to skip to the LSN of interest, so this is a convenience function. It does do a faster skip, although it might bypass some kinds of errors in logical decoding.
This function works only on disabled subscriptions.
The usual sequence of steps is:
- Identify the problem subscription and LSN of the problem commit.
- Disable the subscription.
- Save a copy of the transaction using
pg_catalog.pg_logical_slot_peek_changeson the source node, if possible. bdr.alter_subscription_skip_changes_uptoon the target node.- Apply repaired or equivalent changes on the target manually, if necessary.
- Reenable the subscription.
Warning
It's easy to make problems worse when using this function. Don't do anything unless you're certain it's the only option.
Synopsis
bdr.alter_subscription_skip_changes_upto(
subname text,
skip_upto_and_including pg_lsn
);Example
Apply of a transaction is failing with an error, and you've determined that lower-impact fixes such as changes on the target side can't resolve this issue. You determine that you must skip the transaction.
In the error logs, find the commit record LSN to skip to, as in this example:
ERROR: XX000: CONFLICT: target_table_missing; resolver skip_if_recently_dropped returned an error: table does not exist CONTEXT: during apply of INSERT from remote relation public.break_me in xact with commit-end lsn 0/300AC18 xid 131315 committs 2021-02-02 15:11:03.913792+01 (action #2) (effective sess origin id=2 lsn=0/300AC18) while consuming 'I' message from receiver for subscription bdr_regression_bdrgroup_node1_node2 (id=2667578509) on node node2 (id=3367056606) from upstream node node1 (id=1148549230, reporiginid=2)
In this portion of log, you have the information you need: the_target_lsn: 0/300AC18 the_subscription: bdr_regression_bdrgroup_node1_node2
Next, disable the subscription so the apply worker doesn't try to connect to the replication slot:
SELECT bdr.alter_subscription_disable('the_subscription');You can't skip only parts of the transaction: it's all or nothing. So we strongly recommend that you save a record of it by copying it out on the provider side first, using the subscription's slot name.
\copy (SELECT * FROM pg_catalog.pg_logical_slot_peek_changes('the_slot_name',
'the_target_lsn', NULL, 'min_proto_version', '1', 'max_proto_version', '1',
'startup_params_format', '1', 'proto_format', 'json'))
TO 'transaction_to_drop.csv' WITH (FORMAT csv);This example is broken into multiple lines for readability, but issue it in a single line. \copy doesn't support multi-line commands.
You can skip the change by changing peek to get, but bdr....skip_changes_upto does a faster skip that avoids decoding and outputting all the data:
SELECT bdr.alter_subscription_skip_changes_upto('subscription_name',
'the_target_lsn');You can apply the same changes (or repaired versions of them) manually to the target node, using the dumped transaction contents as a guide.
Finally, reenable the subscription:
SELECT bdr.alter_subscription_enable('the_subscription');bdr.connection_manager_refresh_pools
This function tells the PGD connection manager to refresh its connection pools.
This function is useful for integration with other tools or when an immediate re-evaluation of connection pools is required. This function operates only on the local node; to refresh connection managers across the entire cluster, you must execute the function on each node individually. Execution requires the bdr_superuser role.
Expand commentComment on line R861Resolved
Synopsis
SELECT bdr.connection_manager_refresh_pools();
Note
This function may terminate existing client connections if routing changes are detected (e.g., write leader changed, nodes added/removed from routing targets, or consensus was lost).
bdr.get_node_sub_apply_lsn
You can use this function on a subscriber to get the last LSN that was received and applied from the given origin.