Lag Control v5

Data throughput of database applications on a PGD origin node can exceed the rate at which committed data can safely replicate to downstream peer nodes. If this disparity persists beyond a period of time or chronically in high availability applications, then organizational objectives related to disaster recovery or business continuity plans might not be satisfied.

The replication Lag Control (RLC) feature is designed to regulate this imbalance using a dynamic rate-limiting device so that data flow between PGD group nodes complies with these organizational objectives. It does so by controlling the extent of replication lag between PGD nodes.

Some of these objectives include the following:

  • Recovery point objective (RPO) specifies the maximum-tolerated amount of data that can be lost due to unplanned events, usually expressed as an amount of time. In non-replicated systems, RPO is used to set backup intervals to limit the risk of lost committed data due to failure. For replicated systems, RPO determines the acceptable amount of committed data that hasn't been safely applied to one or more peer nodes.

  • Resource constraint objective (RCO) acknowledges that there are finite storage constraints. This storage includes database files, WAL, and temporary or intermediate files needed for continued operation. For replicated systems, as lag increases the demands on these storage resources also increase.

  • Group elasticity objective (GEO) ensures that any node isn't originating new data at a clip that can't be acceptably saved to its peer nodes. When that's the case, then the detection of that condition can be used as one metric in the decision to expand the number of database nodes. Similarly, when that condition abates then it might influence the decision to contract the number of database nodes.

Lag Control manages replication lag by controlling the rate at which client connections can commit READ WRITE transactions. Replication lag is measured either as lag time or lag size, depending on the objectives to meet. Transaction commit rate is regulated using a configured PGD commit-delay time.


To get started using Lag Control:

  • Determine the maximum acceptable commit delay time max_commit_delay that all database applications can tolerate.

  • Decide on the lag measure to use. Choose either lag size max_lag_size or lag time max_lag_time.

  • Decide on the groups or subgroups involved and the minimum number of nodes in each collection required to satisfy confirmation. This information forms the basis for the definition of a commit scope rule.


You specify lag control in a commit scope, which allows consistent and coordinated parameter settings across the nodes spanned by the commit scope rule. You can include a lag control specification in the default commit scope of a top group or as part of an origin group commit scope.

Using the sample node groups from Commit scope, this example shows Lag Control rules for two data centers:

-- create a Lag Control commit scope with individual rules
-- for each sub-group
SELECT bdr.add_commit_scope(
    commit_scope_name := 'example_scope',
    origin_node_group := 'left_dc',
    rule := 'ALL (left_dc) LAG CONTROL (max_commit_delay=500ms, max_lag_time=30s) AND ANY 1 (right_dc) LAG CONTROL (max_commit_delay=500ms, max_lag_time=30s)',
    wait_for_ready := true
SELECT bdr.add_commit_scope(
    commit_scope_name := 'example_scope',
    origin_node_group := 'right_dc',
    rule := 'ANY 1 (left_dc) LAG CONTROL (max_commit_delay=0.250ms, max_lag_size=100MB) AND ALL (right_dc) LAG CONTROL (max_commit_delay=0.250ms, max_lag_size=100MB)',
    wait_for_ready := true

The parameter values admit unit specification that's compatible with GUC parameter conventions.

You can add a lag control commit scope rule to existing commit scope rules that also include group commit and CAMO rule specifications.

The max_commit_delay parameter permits and encourages a specification of milliseconds with a fractional part, including a submillisecond setting, if appropriate.


Lag Control is a dynamic TPS, rate-limiting mechanism that operates at the client connection level. It's designed to be as unobtrusive as possible while satisfying configured lag-control constraints. This means that if enough PGD nodes can replicate changes fast enough to remain below configured lag measure thresholds, then the PGD runtime commit delay stays fixed at 0 milliseconds.

If this isn't the case, minimally adjust the PGD runtime commit delay as high as needed, but no higher, until the number of conforming nodes returns to the minimum threshold.

Even after the minimum node threshold is reached, Lag Control continues to attempt to drive the PGD runtime commit delay back to zero. The PGD commit delay might rise and fall around an equilibrium level most of the time, but if data throughput or lag-apply rates improve, then the commit delay decreases over time.

The PGD commit delay is a post-commit delay. It occurs after the transaction has committed and after all Postgres resources locked or acquired by the transaction are released. Therefore, the delay doesn't prevent concurrent active transactions from observing or modifying its values or acquiring its resources. The same guarantee can't be made for external resources managed by Postgres extensions. Regardless of extension dependencies, the same guarantee can be made if the PGD extension is listed before extension-based resource managers in postgresql.conf.

Strictly speaking, the PGD commit delay isn't a per-transaction delay. It's the mean value of commit delays over a stream of transactions for a particular client connection. This technique allows the commit delay and fine-grained adjustments of the value to escape the coarse granularity of OS schedulers, clock interrupts, and variation due to system load. It also allows the PGD runtime commit delay to settle within microseconds of the lowest duration possible to maintain a lag measure threshold.


Don't conflate the PGD commit delay with the Postgres commit delay. They are unrelated and perform different functions. Don't substitute one for the other.

Transaction application

The PGD commit delay is applied to all READ WRITE transactions that modify data for user applications. This behavior implies that any transaction that doesn't modify data, including declared READ WRITE transactions, is exempt from the commit delay.

Asynchronous transaction commit also executes a PGD commit delay. This might appear counterintuitive, but asynchronous commit, by virtue of its performance, can be one of the greatest sources of replication lag.

Postgres and PGD auxillary processes don't delay at transaction commit. Most notably, PGD writers don't execute a commit delay when applying remote transactions on the local node. This is by design, as PGD writers contribute nothing to outgoing replication lag and can reduce incoming replication lag the most by not having their transaction commits throttled by a delay.


The maximum commit delay is a ceiling value representing a hard limit, which means that a commit delay never exceeds the configured value. Conversely, the maximum lag measures both by size and time and are soft limits that can be exceeded. When the maximum commit delay is reached, there's no additional back pressure on the lag measures to prevent their continued increase.

There's no way to exempt origin transactions that don't modify PGD replication sets from the commit delay. For these transactions, it can be useful to SET LOCAL the maximum transaction delay to 0.


Application TPS is one of many factors that can affect replication lag. Other factors include the average size of transactions for which PGD commit delay can be less effective. In particular, bulk load operations can cause replication lag to rise, which can trigger a concomitant rise in the PGD runtime commit delay beyond the level reasonably expected by normal applications, although still under the maximum allowed delay.

Similarly, an application with a very high OLTP requirement and modest data changes can be unduly restrained by the acceptable PGD commit delay setting.

In these cases, it can be useful to use the SET [SESSION|LOCAL] command to custom configure Lag Control settings for those applications or modify those applications. For example, bulk load operations are sometimes split into multiple, smaller transactions to limit transaction snapshot duration and WAL retention size or establish a restart point if the bulk load fails. In deference to Lag Control, those transaction commits can also schedule very long PGD commit delays to allow digestion of the lag contributed by the prior partial bulk load.

Meeting organizational objectives

In the example objectives list earlier:

  • RPO can be met by setting an appropriate maximum lag time.
  • RCO can be met by setting an appropriate maximum lag size.
  • GEO can be met by monitoring the PGD runtime commit delay and the PGD runtime lag measures,

As mentioned, when the maximum PGD runtime commit delay is pegged at the PGD configured commit-delay limit, and the lag measures consistently exceed their PGD-configured maximum levels, this scenario can be a marker for PGD group expansion.