I recently had a few customers ask me about using a query for monitoring SMR XDB lag, so they can keep tabs on replication progress and set up notifications.
Unfortunately, because of the way WAL works, it's a bit tedious to try to quantify lag in terms of rows (You'd have to take each WAL file, scan, and filter for unrelated WAL entries, based on sync_ids in the XDB control database). Additionally, tracking WAL-based SMR replication lag cannot be done with an earlier solution. Fortunately, it is possible to calculate the time lag with a query:
WITH src_db (current_xid_commit_timestamp) AS (SELECT timestamp FROM pg_last_committed_xact()), -- last replicated timestamp for target databases target_db (target_db_id, last_repl_xid_timestamp) AS (SELECT sub_db_id AS target_db_id, MAX(last_repl_xid_timestamp) AS last_repl_xid_timestamp FROM _edb_replicator_pub.rrep_txset rt JOIN _edb_replicator_sub.xdb_subscriptions xs ON xs.sub_id=rt.sub_id WHERE status = 'C' GROUP BY sub_db_id), -- identity of target databases target_db_identity (target_db_id, target_db_identity) AS (SELECT sub_db_id, db_host || ':' || db_port || ':' || db_name FROM _edb_replicator_sub.xdb_sub_database) -- replication lag for each of the target databases SELECT c.target_db_id, target_db_identity, AGE(current_xid_commit_timestamp, last_repl_xid_timestamp) repl_time_lag FROM src_db a, target_db b, target_db_identity c WHERE b.target_db_id = c.target_db_id ORDER BY target_db_id;
By examining the replication timestamps, we are able to get a good handle on the time delay between the associated servers. Additionally, with some more tweaking, we can use a similar query to calculate time lag for Multi-Master Replication (MMR) clusters:
WITH src_db (current_xid_commit_timestamp) AS (SELECT timestamp FROM pg_last_committed_xact()), -- last replicated timestamp for target databases target_db (target_db_id, last_repl_xid_timestamp) AS (SELECT db_id AS target_db_id, MAX(last_repl_xid_timestamp) AS last_repl_xid_timestamp FROM _edb_replicator_pub.rrep_mmr_txset WHERE status = 'C' GROUP BY db_id), -- identity of target databases target_db_identity (target_db_id, target_db_identity) AS (SELECT pub_db_id, db_host || ':' || db_port || ':' || db_name FROM _edb_replicator_pub.xdb_pub_database) -- replication lag for each of the target databases SELECT c.target_db_id, target_db_identity, AGE(current_xid_commit_timestamp, last_repl_xid_timestamp) repl_time_lag FROM src_db a, target_db b, target_db_identity c WHERE b.target_db_id = c.target_db_id ORDER BY target_db_id;
With these queries, DBAs and sysadmins can easily set up notification and monitoring systems for XDB performance.