Timestamp-based snapshots v4

The timestamp-based snapshots allow reading data in a consistent manner by using a user-specified timestamp rather than the usual MVCC snapshot. You can use this to access data on different BDR nodes at a common point in time. For example, you can use this as a way to compare data on multiple nodes for data-quality checking.

This feature doesn't currently work with write transactions.

Enable the use of timestamp-based snapshots using the snapshot_timestamp parameter. This parameter accepts either a timestamp value or a special value, 'current', which represents the current timestamp (now). If snapshot_timestamp is set, queries use that timestamp to determine visibility of rows rather than the usual MVCC semantics.

For example, the following query returns the state of the customers table at 2018-12-08 02:28:30 GMT:

SET snapshot_timestamp = '2018-12-08 02:28:30 GMT';
SELECT count(*) FROM customers;

Without BDR, this works only with future timestamps or the special 'current' value, so you can't use it for historical queries.

BDR works with and improves on that feature in a multi-node environment. First, BDR makes sure that all connections to other nodes replicate any outstanding data that was added to the database before the specified timestamp. This ensures that the timestamp-based snapshot is consistent across the whole multi-master group. Second, BDR adds a parameter called bdr.timestamp_snapshot_keep. This specifies a window of time when you can execute queries against the recent history on that node.

You can specify any interval, but be aware that VACUUM (including autovacuum) doesn't clean dead rows that are newer than up to twice the specified interval. This also means that transaction ids aren't freed for the same amount of time. As a result, using this can leave more bloat in user tables. Initially, we recommend 10 seconds as a typical setting, although you can change that as needed.

Once the query is accepted for execution, the query might run for longer than bdr.timestamp_snapshot_keep without problem, just as normal.

Also, information about how far the snapshots were kept doesn't survive server restart. The oldest usable timestamp for the timestamp-based snapshot is the time of last restart of the PostgreSQL instance.

You can combine the use of bdr.timestamp_snapshot_keep with the postgres_fdw extension to get a consistent read across multiple nodes in a BDR group. You can use this to run parallel queries across nodes, when used with foreign tables.

There are no limits on the number of nodes in a multi-node query when using this feature.

Use of timestamp-based snapshots doesn't increase inter-node traffic or bandwidth. Only the timestamp value is passed in addition to query data.