PostgreSQL Replication and Automatic Failover Tutorial

January 24, 2023

1. What Is PostgreSQL Replication?

The process of copying data from a PostgreSQL database server to another server is called PostgreSQL Replication. The source database server is usually called the Master server, whereas the database server receiving the copied data is called the Replica server.

PostgreSQL Replication

2. What Is Automatic Failover in PostgreSQL?

Once physical streaming replication has been set up and configured in PostgreSQL, failover can take place if the primary server for the database fails. Failover is the term to describe the recovery process, which in PostgreSQL, can take some time, particularly as PostgreSQL itself does not provide built-in tools for detecting server failures. Fortunately, there are tools available that allow for Automatic Failover, which can help detect failures and automatically switch to the standby, minimizing database downtime.

EnterpriseDB’s EDB Postgres Failover Manager lets you automatically detect database failures and promotes the most current standby server as the new master, helping to avoid costly database downtime. EDB Failover Manager even provides fast, automatic failure detection.

3. High Availability and Failover Replication

High Availability refers to database systems that are set up so that standby servers can take over quickly when the master or primary server fails. To achieve high availability, a database system should meet some key requirements: it should have redundancy to prevent single points of failure, reliable switchover mechanisms, and active monitoring to detect any failures that may occur. Setting up failover replication provides the needed redundancy to allow for high availability by ensuring that standbys are available if the master or primary server ever goes down.

4. Why Use PostgreSQL Replication?

Replication of data can have many uses:

OLTP Performance
Fault Tolerance
Data Migration
Testing Systems in Parallel

OLTP performance: Removing reporting query load from the online transaction processing (OLTP) system improves both reporting query time and transaction processing performance.

Fault tolerance: In the event of master database server failure, the replica server can take over, since it already contains the master server’s data. In this configuration the replica server is also called the standby server. This configuration can also be used for regular maintenance of the primary server.

Data migration: To upgrade database server hardware, or to deploy the same system for another customer.

Testing systems in parallel: When porting the application from one DBMS to another, the results on the same data from both the old and new systems must be compared to ensure that the new system works as expected.

EnterpriseDB’s EDB Postgres Replication Server makes managing Postgres replication easy, offering all the benefits replication provides.

5. What Are the Models of PostgreSQL Database Replication (Single-Master & Multi-Master)?

Single-Master Replication (SMR)
Multi-Master Replication (MMR)

In Single-Master Replication (SMR), changes to table rows in a designated master database server are replicated to one or more replica servers. The replicated tables in the replica database are not permitted to accept any changes (except from the master). But even if they do, changes are not replicated back to the master server.

In Multi-Master Replication (MMR), changes to table rows in more than one designated master database are replicated to their counterpart tables in every other master database. In this model conflict resolution schemes are often employed to avoid problems like duplicate primary keys.

PostgreSQL Database Replication

Multi-Master Replication (MMR) adds to the uses of replication:

Write availability and scalability.
Ability to employ a wide area network (WAN) of master databases that can be geographically close to groups of clients, yet maintain data consistency across the network.

6. What Are the Classes of PostgreSQL Replication?

Unidirectional Replication
Bidirectional Replication

Single-Master Replication is also called unidirectional, since replication data flows in one direction only, from master to replica.

Multi-Master Replication data, on the other hand, flows in both directions, and it is therefore called bidirectional replication.

7. What Are the Replication Modes in PostgreSQL Database?

Asynchronous Mode of Replication
Synchronous Mode of Replication

In synchronous mode replication, transactions on the master database are declared complete only when those changes have been replicated to all the replicas. The replica servers must all be available all the time for the transactions to complete on the master.

Synchronous mode PostgreSQL replication

In asynchronous mode, transactions on the master server can be declared complete when the changes have been done on just the master server. These changes are then replicated to the replicas later in time. The replica servers can remain out-of-sync for a certain duration, which is called a replication lag.

Replication Modes in PostgreSQL Database

Synchronous and asynchronous modes both have their costs and benefits, and users will want to consider safety and performance when configuring their replication settings.

8. What Are the Types of PostgreSQL Database Replication?

Physical Replication of PostgreSQL Database
Logical Replication of PostgreSQL Database

Before we discuss physical and logical replication types, let’s review the context of the terms “physical” and “logical” here.

	Logical Operation.	Physical Operation
1	initdb	Creates a base directory for the cluster
2	CREATE DATABASE	Creates a subdirectory in the base directory
3	CREATE TABLE	Creates a file within the subdirectory of the database
4	INSERT	Changes the file that was created for this particular table and writes new WAL records in the current WAL segment

For example:

ramp=# create table sample_tbl(a int, b varchar(255));

CREATE TABLE

ramp=# SELECT pg_relation_filepath('sample_tbl');

 pg_relation_filepath 

----------------------

 base/34740/706736

(1 row)

ramp=# SELECT datname, oid FROM pg_database WHERE datname = 'ramp';

 datname |  oid  

---------+-------

 ramp    | 34740

(1 row)

ramp=# SELECT relname, oid FROM pg_class WHERE relname = 'sample_tbl';

  relname   |  oid   

------------+--------

 sample_tbl | 706736

(1 row)

Physical replication deals with files and directories. It has no knowledge of what these files and directories represent. Physical replication is done at file system level or disk level.

Logical replication, on the other hand, deals with databases, tables, and DML operations. It is therefore possible in logical replication to replicate a certain set of tables only. Logical replication is done at the database cluster level.

9. Introduction to Write-Ahead Logging (WAL) in PostgreSQL Database

9.1 What is Write-Ahead Logging (WAL) in PostgreSQL and Why Is It Required?

In PostgreSQL, all changes made by a transaction are first saved in a log file, and then the result of the transaction is sent to the initiating client. Data files themselves are not changed on every transaction. This is a standard mechanism to prevent data loss in case of circumstances like operating system crash, hardware failure, or PostgreSQL crash. This mechanism is called Write Ahead Logging (WAL), and the log file is called Write Ahead Log.

Each change that the transaction performs (INSERT, UPDATE, DELETE, COMMIT) is written in the log as a WAL record. WAL records are first written into an in-memory WAL buffer. When the transaction is committed the records are written into a WAL segment file on the disk.

A WAL record’s Log Sequence Number (LSN) represents the location/position where the record is saved in the log file. LSN is used as a unique id of the WAL record. Logically, a transaction log is a file whose size is 2^64 bytes. LSN is therefore a 64-bit number represented as two 32-bit hexadecimal numbers separated by a /. For example:

select pg_current_wal_lsn();

 pg_current_wal_lsn 

--------------------

 0/2BDBBD0

(1 row)

If there is a system crash, the database can recover committed transactions from the WAL. The recovery starts from the last REDO point or checkpoint. A checkpoint is a point in the transaction log at which all data files have been updated to match the information in the log. The process of saving the WAL records from the log file to the actual data files is called check-pointing.

Let’s consider a case where the database crashes after two transactions that perform one insert each and WAL is used for recovery.

Assume a checkpoint has been issued, which stores the location of the latest REDO point in the current WAL segment. This also flushes all dirty pages in the shared buffer pool to the disk. This action guarantees that WAL records before the REDO point are no longer needed for recovery, since all data has been flushed to the disk pages.
The first INSERT statement is issued. The table’s page is loaded from disk to the buffer pool.
A tuple is inserted into the loaded page.
The WAL record of this insert is saved into the WAL buffer at location LSN_1.
The page’s LSN is updated from LSN_0 to LSN_1, which identifies the WAL record for the last change to this page.
The first COMMIT statement is issued.
The WAL record of this commit action is written into the WAL buffer, and then all WAL records in the WAL buffer up to this page’s LSN are flushed to the WAL segment file.
For the second INSERT and commit, steps 2 to 7 are repeated.

WAL in PostgreSQL

If there is an operating system crash, all the data on the shared buffer pool will be lost. However all modifications to the page have been written into the WAL segment files as history data. The following steps show how our database cluster can be restored back to the state immediately before the crash using WAL records. Nothing special needs to be done—PostgreSQL will automatically enter into recovery mode after restarting.

PostgreSQL reads the WAL record of the first INSERT statement from the appropriate WAL segment file.
PostgreSQL loads the table's page from the database cluster into the shared buffer pool.
PostgreSQL compares the WAL record's LSN (LSN_1) with the page LSN (LSN_0). Since LSN_1 is greater than LSN_0, the tuple in the WAL record is inserted into the page and the page's LSN is updated to LSN_1.

The remaining WAL records are replayed in the similar manner.

WAL Records in PostgreSQL

10. What Are the Transaction Log and WAL Segment Files in PostgreSQL?

The PostgreSQL transaction log is a virtual file with a capacity of 8-byte length. Physically, the log is divided into 16-MB files, each of which is called a WAL segment.

The WAL segment file name is a 24-digit number whose naming rule is as follows:

WAL Segment File in PostgreSQL

Assuming that current time line ID is 0x00000001 the first WAL segment file names will be:

00000001 00000000 0000000

00000001 00000000 0000001

00000001 00000000 0000002

……….

00000001 00000001 0000000

00000001 00000001 0000001

00000001 00000001 0000002

…………

00000001 FFFFFFFF FFFFFFFD

00000001 FFFFFFFF FFFFFFFE

00000001 FFFFFFFF FFFFFFFF

For example:

select pg_walfile_name('0/2BDBBD0');

     pg_walfile_name      

--------------------------

 000000010000000000000002

11. What Is the WAL Writer in PostgreSQL?

The WAL writer is a background process that periodically checks the WAL buffer and writes any unwritten WAL records into the WAL segments. The WAL writer avoids bursts of IO activity and instead spans its process over time with little IO activity. The configuration parameter wal_writer_delay controls how often the WAL writer flushes the WAL, with a default value of 200 ms.

12. WAL Segment File Management

12.1 Where WAL segment files are stored?

WAL segment files are stored in the pg_wal subdirectory.

12.2 What are the conditions when PostgreSQL switches to a new WAL segment file?

PostgreSQL switches to a new WAL segment file under the following conditions:

The WAL segment has been filled up.
The function pg_switch_wal has been issued.
archive_mode is enabled and the time set to archive_timeout has been exceeded.

After they are switched out, WAL files can either be removed or recycled—i.e., renamed and reused for the future. The number of WAL files that the server would retain at any point in time depends on server configuration and server activity.

Whenever the checkpoint starts, PostgreSQL estimates and prepares the number of WAL segment files required for this checkpoint cycle. Such an estimate is made based on the numbers of files consumed in previous checkpoint cycles. They are counted from the segment that contains the prior REDO point, and the value is to be between min_wal_size (by default, 80 MB, or 5 files) and max_wal_size (1 GB, or 64 files). If a checkpoint starts, necessary files will be held and recycled, while the unnecessary ones will be removed.

An example is provided in the diagram below. Assume that there are six files before checkpoint starts, the previous REDO point is contained in the file WAL_3, and PostgreSQL estimates that five files are to be retained. In this case, WAL_1 will be renamed as WAL_7 for recycling, and WAL_2 will be removed.

WAL Segment File Management in PostgreSQL

13. Write-Ahead Logging (WAL) Example for PostgreSQL

Step 1:

SELECT datname, oid FROM pg_database WHERE datname = 'postgres';

datname  |  oid  

----------+-------

postgres | 15709

Note the database OID, i.e. 15709

Step 2:

SELECT oid,* from pg_tablespace;

oid  |  spcname   | spcowner | spcacl | spcoptions 

------+------------+----------+--------+------------

1663 | pg_default |       10 |        | 

1664 | pg_global  |       10 |        | 

Note the table space OID, i.e. 1663

Step 3:

SELECT pg_current_wal_lsn();

pg_current_wal_lsn 

--------------------

0/1C420B8

Note the LSN, i.e. 0/1C420B8

Step 4:

CREATE TABLE abc(a VARCHAR(10));

Step 5:

SELECT pg_relation_filepath('abc');

pg_relation_filepath 

----------------------

base/15709/16384

Note the relation filename, base/15709/16384

Step 6:

View the contents of the files at ./pg_waldump --path=/tmp/sd/pg_wal –start=0/1C420B8

Using the start LSN noted in step 3.

Note that the WAL contains the instruction to create physical file

15709 → database postgres → noted in step 1

16384 → table abc → noted in step 5

rmgr	Len(rec/tot)	tx	lsn	prev	desc
XLOG	30/ 30	0	0/01C420B8	0/01C42080	NEXTOID 24576
Storage	42/ 42	0	0/01C420D8	0/01C420B8	CREATE base/15709/16384
Heap	203/203	1216	0/01C42108	0/01C420D8	INSERT off 2, blkref #0: rel 1663/15709/1247 blk 0
Btree	64/ 64	1216	0/01C421D8	0/01C42108	INSERT_LEAF off 298, blkref #0: rel 1663/15709/2703 blk 2
Btree	64/ 64	1216	0/01C42218	0/01C421D8	INSERT_LEAF off 7, blkref #0: rel 1663/15709/2704 blk 5
Heap	80/ 80	1216	0/01C42258	0/01C42218	INSERT off 30, blkref #0: rel 1663/15709/2608 blk 9
Btree	72/ 72	1216	0/01C422A8	0/01C42258	INSERT_LEAF off 243, blkref #0: rel 1663/15709/2673 blk 51
Btree	72/ 72	1216	0/01C422F0	0/01C422A8	INSERT_LEAF off 170, blkref #0: rel 1663/15709/2674 blk 61
Heap	203/203	1216	0/01C42338	0/01C422F0	INSERT off 6, blkref #0: rel 1663/15709/1247 blk 1
Btree	64/64	1216	0/01C42408	0/01C42338	INSERT_LEAF off 298, blkref #0: rel 1663/15709/2703 blk 2
Btree	72/ 72	1216	0/01C42448	0/01C42408	INSERT_LEAF off 3, blkref #0: rel 1663/15709/2704 blk 1
Heap	80/ 80	1216	0/01C42490	0/01C42448	INSERT off 36, blkref #0: rel 1663/15709/2608 blk 9
Btree	72/ 72	1216	0/01C424E0	0/01C42490	INSERT_LEAF off 243, blkref #0: rel 1663/15709/2673 blk 51
Btree	72/ 72	1216	0/01C42528	0/01C424E0	INSERT_LEAF off 97, blkref #0: rel 1663/15709/2674 blk 57
Heap	199/199	1216	0/01C42570	0/01C42528	INSERT off 2, blkref #0: rel 1663/15709/1259 blk 0
Btree	64/ 64	1216	0/01C42638	0/01C42570	INSERT_LEAF off 257, blkref #0: rel 1663/15709/2662 blk 2
Btree	64/ 64	1216	0/01C42678	0/01C42638	INSERT_LEAF off 8, blkref #0: rel 1663/15709/2663 blk 1
Btree	64/ 64	1216	0/01C426B8	0/01C42678	INSERT_LEAF off 217, blkref #0: rel 1663/15709/3455 blk 5
Heap	171/171	1216	0/01C426F8	0/01C426B8	INSERT off 53, blkref #0: rel 1663/15709/1249 blk 16
Btree	64/ 64	1216	0/01C427A8	0/01C426F8	INSERT_LEAF off 185, blkref #0: rel 1663/15709/2658 blk 25
Btree	64/ 64	1216	0/01C427E8	0/01C427A8	INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
Heap	171/171	1216	0/01C42828	0/01C427E8	INSERT off 54, blkref #0: rel 1663/15709/1249 blk 16
Btree	72/ 72	1216	0/01C428D8	0/01C42828	INSERT_LEAF off 186, blkref #0: rel 1663/15709/2658 blk 25
Btree	64/ 64	1216	0/01C42920	0/01C428D8	INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
Heap	171/171	1216	0/01C42960	0/01C42920	INSERT off 55, blkref #0: rel 1663/15709/1249 blk 16
Btree	72/ 72	1216	0/01C42A10	0/01C42960	INSERT_LEAF off 187, blkref #0: rel 1663/15709/2658 blk 25
Btree	64/ 64	1216	0/01C42A58	0/01C42A10	INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
Heap	171/171	1216	0/01C42A98	0/01C42A58	INSERT off 1, blkref #0: rel 1663/15709/1249 blk 17
Btree	72/ 72	1216	0/01C42B48	0/01C42A98	INSERT_LEAF off 186, blkref #0: rel 1663/15709/2658 blk 25
Btree	64/ 64	1216	0/01C42B90	0/01C42B48	INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
Heap	171/171	1216	0/01C42BD0	0/01C42B90	INSERT off 3, blkref #0: rel 1663/15709/1249 blk 17
Btree	72/ 72	1216	0/01C42C80	0/01C42BD0	INSERT_LEAF off 188, blkref #0: rel 1663/15709/2658 blk 25
Btree	64/ 64	1216	0/01C42CC8	0/01C42C80	INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
Heap	171/171	1216	0/01C42D08	0/01C42CC8	INSERT off 5, blkref #0: rel 1663/15709/1249 blk 17
Btree	72/ 72	1216	0/01C42DB8	0/01C42D08	INSERT_LEAF off 186, blkref #0: rel 1663/15709/2658 blk 25
Btree	64/ 64	1216	0/01C42E00	0/01C42DB8	INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
Heap	171/171	1216	0/01C42E40	0/01C42E00	INSERT off 30, blkref #0: rel 1663/15709/1249 blk 32
Btree	72/ 72	1216	0/01C42EF0	0/01C42E40	INSERT_LEAF off 189, blkref #0: rel 1663/15709/2658 blk 25
Btree	64/ 64	1216	0/01C42F38	0/01C42EF0	INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16
Heap	80/ 80	1216	0/01C42F78	0/01C42F38	INSERT off 25, blkref #0: rel 1663/15709/2608 blk 11
Btree	72/ 72	1216	0/01C42FC8	0/01C42F78	INSERT_LEAF off 131, blkref #0: rel 1663/15709/2673 blk 44
Btree	72/ 72	1216	0/01C43010	0/01C42FC8	INSERT_LEAF off 66, blkref #0: rel 1663/15709/2674 blk 46
Standby	42/ 42	1216	0/01C43058	0/01C43010	LOCK xid 1216 db 15709 rel 16384
Txn	405/405	1216	0/01C43088	0/01C43058	COMMIT 2019-03-04 07:42:23.165514 EST;... snapshot 2608 relcache 16384
Standby	50/ 50	0	0/01C43220	0/01C43088	RUNNING_XACTS nextXid 1217 latestCompletedXid 1216 oldestRunningXid 1217

Step 7:

SELECT pg_current_wal_lsn();

 pg_current_wal_lsn 

--------------------

 0/1C43258

(1 row)

Step 8:

INSERT INTO abc VALUES('pkn');

Step 9:

./pg_waldump --path=/tmp/sd/pg_wal --start=0/1C43258

using the start LSN from step 7.

1663 → pg_default tablespace → noted in step 2

15709 → database postgres → noted in step 1

16384 → table abc → noted in step 5

rmgr	Len (rec/tot)	tx	lsn	prev	desc
Heap	59/59	1217	0/01C43258	0/01C43220	INSERT+INIT off 1, blkref #0: rel 1663/15709/16384 blk 0
Transaction	34/34	1217	0/01C43298	0/01C43258	COMMIT 2019-03-04 07:43:45.887511 EST
Standby	54/54	0	0/01C432C0	0/01C43298	RUNNING_XACTS nextXid 1218 latestCompletedXid 1216 oldestRunningXid 1217; 1 xacts: 1217

Step 10:

SELECT pg_current_wal_lsn();

 pg_current_wal_lsn 

--------------------

 0/1C432F8

(1 row)

Step 11:

INSERT INTO abc VALUES('ujy');

Step 12:

./pg_waldump --path=/tmp/sd/pg_wal –start=0/1C432F8

Using the start LSN from step 10.

rmgr	Len (rec/tot)	tx	lsn	prev	desc
Heap	59/59	1218	0/01C432F8	0/01C432C0	INSERT off 2, blkref #0: rel 1663/15709/16384 blk 0
Transaction	34/34	1218	0/01C43338	0/01C432F8	COMMIT 2019-03-04 07:44:25.449151 EST
Standby	50/50	0	0/01C43360	0/01C43338	RUNNING_XACTS nextXid 1219 latestCompletedXid 1218 oldestRunningXid 1219

Step 13: Check the actual tuples in the WAL segment files.

---------+---------------------------------------------------+----------------+

 Offset  | Hex Bytes                                         | ASCII chars    |

---------+---------------------------------------------------+----------------+

00000060 | 3b 00 00 00 c3 04 00 00  28 00 40 02 00 00 00 00  |;.......(.@.....|

00000070 | 00 0a 00 00 ec 28 75 6e  00 20 0a 00 7f 06 00 00  |.....(un. ......|

00000080 | 5d 3d 00 00 00 40 00 00  00 00 00 00 ff 03 01 00  |]=...@..........|

00000090 | 02 08 18 00 09 70 6b 6e  03 00 00 00 00 00 00 00  |.....pkn........|

000000a0 | 22 00 00 00 c3 04 00 00  60 00 40 02 00 00 00 00  |".......`.@.....|

000000b0 | 00 01 00 00 dd 4c 87 04  ff 08 e4 73 44 e7 41 26  |.....L.....sD.A&|

000000c0 | 02 00 00 00 00 00 00 00  32 00 00 00 00 00 00 00  |........2.......|

000000d0 | a0 00 40 02 00 00 00 00  10 08 00 00 9e 01 36 88  |..@...........6.|

000000e0 | ff 18 00 00 00 00 00 00  00 00 00 03 00 00 c4 04  |................|

000000f0 | 00 00 c4 04 00 00 c3 04  00 00 00 00 00 00 00 00  |................|



00000100 | 3b 00 00 00 c4 04 00 00  c8 00 40 02 00 00 00 00  |;.........@.....|

00000110 | 00 0a 00 00 33 df b4 71  00 20 0a 00 7f 06 00 00  |....3..q. ......|

00000120 | 5d 3d 00 00 00 40 00 00  00 00 00 00 ff 03 01 00  |]=...@..........|

00000130 | 02 08 18 00 09 75 6a 79  04 00 00 00 00 00 00 00  |.....ujy........|

00000140 | 22 00 00 00 c4 04 00 00  00 01 40 02 00 00 00 00  |".........@.....|

00000150 | 00 01 00 00 96 2e 96 a6  ff 08 d8 f3 79 ed 41 26  |............y.A&|

00000160 | 02 00 00 00 00 00 00 00  32 00 00 00 00 00 00 00  |........2.......|

00000170 | 40 01 40 02 00 00 00 00  10 08 00 00 eb 6b 95 36  |@.@..........k.6|

00000180 | ff 18 00 00 00 00 00 00  00 00 00 03 00 00 c5 04  |................|

00000190 | 00 00 c5 04 00 00 c4 04  00 00 00 00 00 00 00 00  |................|

000001a0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

14. What Are The Replication Options Based on WAL in PostgreSQL?

Continuous WAL Archiving
Log Shipping Based Replication — File Level
Log Shipping Based Replication — Block Level

14.1 Continuous WAL Archiving

Copying WAL files as they are generated into any location other than the pg_wal subdirectory for the purpose of archiving them is called WAL archiving. PostgreSQL will invoke a script provided by the user for archiving each time a WAL file is generated. The script can use the scp command to copy the file to one or more locations. The location can be an NFS mount. Once archived the WAL segment files can be used to recover the database at any specified point in time.

14.2 Log Shipping Based Replication — File Level

Copying log files to another PostgreSQL server for the purpose of creating another standby server by replaying WAL files is called Log Shipping. This standby server is configured to be in recovery mode, and its sole purpose is to apply any new WAL files as they arrive. This second server (also termed as standby) then becomes a warm backup of the primary PostgreSQL server. The standby can also be configured to be a read replica, where it can also serve read-only queries. This is called a hot standby.

14.3 Log Shipping Based Replication — Block Level

Streaming replication improves the log shipping process. Instead of waiting for the WAL switch, the records are sent as they are generated, thus reducing replication delay. Another improvement is that the standby server will connect to the primary server over the network using a replication protocol. The primary server can then send WAL records directly over this connection without having to rely on scripts provided by the end user.

14.4 How long should the primary server retain WAL segment files?

If there aren’t any streaming replication clients, the server can discard/recycle the WAL segment file once the archive script reports success, if they are not required for crash recovery.

The presence of standby clients, though, creates a problem: the server needs to keep WAL files around for as long as the slowest standby needs them. If, for example, the standby was taken down for a while, then comes back online and asks the primary for a WAL file that the primary no longer has, the replication fails with an error similar to:

ERROR: requested WAL segment 00000001000000010000002D has already been removed

The primary server should therefore keep track of how far behind the standby is, and not delete/recycle WAL files that any standbys still need. This feature is provided through replication slots.

Each replication slot has a name which is used to identify the slot. Each slot is associated with:

The oldest WAL segment file required by the consumer of the slot. WAL segment files more recent than this are not deleted/recycled during checkpoints.
The oldest transaction ID required to be retained by the consumer of the slot. Rows needed by any transactions more recent than this are not deleted by vacuum.

15. Log Shipping Based Replication

15.1 Physical Streaming Replication

Physical Streaming Replication in PostgreSQL

Physical Streaming Replication in PostgreSQL database is the data replication based on WAL. In streaming replication the standby server connects to the primary server and receives WAL records using a replication protocol.

What are the advantages of physical streaming replication?

The standby server does not need to wait for the WAL file to fill up, which improves replication lag.
The dependency on the user-provided script and an intermediate shared storage between the servers is removed.

16. What Are WAL Sender and WAL Receiver in PostgreSQL?

A process called WAL receiver, running on the standby server, uses the connection details provided in the primary_conninfo parameter of recovery.conf and connects to the primary server using a TCP/IP connection.

WAL sender is another process running on the primary server and is in charge of sending the WAL records to the standby server as they are generated. The WAL receiver saves the WAL records in WAL as if they were generated by client activity of locally connected clients. Once the WAL records reach the WAL segment files the standby server constantly keeps replaying the WAL so that standby and primary are up to date.

16.1 WAL Streaming Protocol Details

WAL Streaming Protocol Details in PostgreSQL

17. PostgreSQL Replication and Failover Over Setup

The setup consists of two CentOS 7 machines connected via LAN on which PostgreSQL version 10.7 is installed.

17.1 Configuring PostgreSQL Replication using WAL Streaming

Step 1: Disable and stop firewall on both the machines:

sudo firewall-cmd --state

sudo systemctl stop firewalld

sudo systemctl disable firewalld

sudo systemctl mask --now firewalld

Step 2: On the primary server, allow replication connections and connections from the same network. Modify pg_hba.conf:

Local  all           all                       md5

host   all           all   172.16.214.167/24   md5

host   all           all   ::1/128             md5

local  replication   all                       md5

host   replication   all   172.16.214.167/24   md5

host   replication   all   ::1/128             md5

Step 3: On the primary server, edit postgresql.conf to modify the following parameters:

max_wal_senders = 10

wal_level = replica

max_replication_slots = 10

synchronous_commit = on

synchronous_standby_names = '*'

listen_addresses = '*'

Step 4: Start the primary server:

./postgres -D ../pr_data -p 5432

Step 5: Take base backup to bootstrap the standby server:

./pg_basebackup 

--pgdata=/tmp/sb_data/ 

--format=p 

--write-recovery-conf 

--checkpoint=fast 

--label=mffb 

--progress 

--verbose 

--host=172.16.214.167 

--port=5432 

--username=postgres

Step 6: Check the base backup label file:

START WAL LOCATION: 0/2000028 (file 000000010000000000000002)

CHECKPOINT LOCATION: 0/2000060

BACKUP METHOD: streamed

BACKUP FROM: master

START TIME: 2019-02-24 05:25:30 EST

LABEL: mffb

Step 7: In the base backup, add the following line in the recovery.conf:

primary_slot_name = 'node_a_slot'

Step 8: Check the /tmp/sb_data/recovery.conf file

standby_mode = 'on'

primary_conninfo = 'user=enterprisedb 

password=abc123 

host=172.16.214.167 

port=5432 

sslmode=prefer 

sslcompression=1 

krbsrvname=postgres 

target_session_attrs=any'

primary_slot_name = 'node_a_slot'

Step 9: Connect to the primary server and issue this command:

edb=# SELECT * FROM pg_create_physical_replication_slot('node_a_slot');

      slot_name  | xlog_position 

    -------------+---------------

     node_a_slot | 

    (1 row)



edb=# SELECT slot_name, slot_type, active FROM pg_replication_slots;

      slot_name  | slot_type | active 

    -------------+-----------+--------

     node_a_slot | physical  | f

    (1 row)

Step 10: Transfer the base backup to the standby server:

scp /tmp/sb_data.tar.gz abbas@172.16.214.166:/tmp

sudo mv /tmp/sb_data /opt/PostgreSQL/10/

sudo chown postgres:postgres /opt/PostgreSQL/10/sb_data/

sudo chown -R postgres:postgres /opt/PostgreSQL/10/sb_data/

sudo chmod 700 /opt/PostgreSQL/10/sb_data/

Step 11: Start the standby server:

./postgres -D ../sb_data/ -p 5432

The primary will show this in the log:

LOG: standby "walreceiver" is now a synchronous standby with priority 1

The standby will show:

LOG: database system was interrupted; last known up at 2018-10-24 15:49:55

LOG: entering standby mode

LOG: redo starts at 0/3000028

LOG: consistent recovery state reached at 0/30000F8

LOG: started streaming WAL from primary at 0/4000000 on timeline 1

Step 12: Connect to the primary server and issue some simple commands:

-bash-4.2$ ./edb-psql -p 5432 edb

Password: 

psql.bin (10.7)

Type "help" for help.



create table abc(a int, b varchar(250));

insert into abc values(1,'One');

insert into abc values(2,'Two');

insert into abc values(3,'Three');

Step 13: Check the data on the replica:

./psql -p 5432 -U postgres postgres

Password for user postgres: 

psql.bin (10.7)

Type "help" for help.

postgres=# select * from abc;

a |   b   

---+-------

1 | One

2 | Two

3 | Three

(3 rows)

18. What are the PostgreSQL Manual Failover Steps?

Step 1: Crash the primary server.

Step 2: Promote the standby server by running the following command on the standby server:

./pg_ctl promote -D ../sb_data/

server promoting

Step 3: Connect to the promoted standby server and insert a row:

-bash-4.2$ ./edb-psql -p 5432 edb

Password: 

psql.bin (10.7)

Type "help" for help.



edb=# insert into abc values(4,'Four');

The fact that this insert works fine means that the standby, which otherwise is a read-only server, has been promoted as the new primary server.

19. How to Automate Failover and Replication in PostgreSQL

Setting up automatic failover is easy using EDB Postgres Failover Manager (EFM). After downloading and installing EFM on each master and standby node, you can create an EFM Cluster, which consists of a Master node, one or more Standby nodes, and an optional Witness node that confirms assertions in case of failure.

EFM continuously monitors system health and sends email alerts based on system events. When a failure occurs, it automatically switches over to the most up-to-date standby, and reconfigures all other standby servers to recognize the new master. It also reconfigures load balancers (such as pgPool) and prevents “split brain” (when two nodes each think they are primary) from occurring.

20. repmgr for PostgreSQL

Another open source tool is repmgr (Replication Manager), which also manages replication and failover for PostgreSQL clusters. EDB has provided an in-depth tutorial for installing and running repmgr for PostgreSQL. When correctly configured, repmgr can detect when the master server fails and perform automatic failover.

Resource Feature Callout 1