PostgreSQL Automatic Failover Replication Tutorial

Getting Started Beginner

Abbas Butt Senior Architect

Table of Contents

1. What is Replication? 

2. Why use Replication? 

3. Models of Replication (Single-Master & Multi-Master) 

4. Classes of Replication (Unidirectional & Bidirectional) 

5. Modes of Replication (Asynchronous & Synchronous) 

6. Types of Replication (Physical & Logical) 

7. Introduction to WAL 

         7.1 What is WAL and why is it required? 

         7.2 Transaction Log and WAL Segment Files 

         7.3 WAL Writer 

         7.4 WAL Segment File Management 

         7.5 WAL Example 

          7.6 Overview of Replication Options based on WAL 

8. Log Shipping Based Replication 

          8.1 Physical Streaming Replication 

          8.2 WAL Sender and WAL Receiver 

          8.3 WAL Streaming Protocol Details 

          8.4 Setup 

          8.5 Configuring PostgreSQL Replication using WAL Streaming 

          8.6 Steps to perform Failover 

 

 

1. What is Replication?

Replication is when data is copied from one database server to another. The source database server is usually called the Master server, whereas the database server receiving the copied data is called the Replica server.

What is Replication?

 

2. Why use Replication?

Replication of data can have many uses:

  • OLTP performance: Removing reporting query load from the online transaction processing (OLTP) system improves both reporting query time and transaction processing performance.
  • Fault tolerance: In the event of master database server failure, the replica server can take over, since it already contains the master server’s data. In this configuration the replica server is also called the standby server. This configuration can also be used for regular maintenance of the primary server.
  • Data migration: To upgrade database server hardware, or to deploy the same system for another customer.
  • Testing systems in parallel: When porting the application from one DBMS to another, the results on the same data from both the old and new systems must be compared to ensure that the new system works as expected.

 

3. Models of Replication (Single-Master & Multi-Master)

In Single-Master Replication (SMR), changes to table rows in a designated master database server are replicated to one or more replica servers. The replicated tables in the replica database are not permitted to accept any changes (except from the master). But even if they do, changes are not replicated back to the master server.

In Multi-Master Replication (MMR), changes to table rows in more than one designated master database are replicated to their counterpart tables in every other master database. In this model conflict resolution schemes are often employed to avoid problems like duplicate primary keys.

Models of replication

 

MMR adds to the uses of replication:

  • -Write availability and scalability.
  • -Ability to employ a wide area network (WAN) of master databases that can be geographically close to groups of clients, yet maintain data consistency across the network.

 

4. Classes of Replication (Unidirectional & Bidirectional)

Single-Master Replication is also called unidirectional, since replication data flows in one direction only, from master to replica. Multi-Master Replication data, on the other hand, flows in both directions, and it is therefore called bidirectional replication.

 

5. Modes of Replication (Asynchronous & Synchronous)

In synchronous mode replication, transactions on the master database are declared complete only when those changes have been replicated to all the replicas. The replica servers must all be available all the time for the transactions to complete on the master.

Modes of replication

In asynchronous mode, transactions on the master server can be declared complete when the changes have been done on just the master server. These changes are then replicated to the replicas later in time. The replica servers can remain out-of-sync for a certain duration, which is called a replication lag.

Modes of replication 2

 

6. Types of Replication (Physical & Logical)

Before we discuss physical and logical replication types, let’s review the context of the terms “physical” and “logical” here.

 

Logical Operation

Physical Operation

1

initdb

Creates a base directory for the cluster

2

CREATE DATABASE

Creates a subdirectory in the base directory

3

CREATE TABLE

Creates a file within the subdirectory of the database

4

INSERT

Changes the file that was created for this particular table and writes new WAL records in the current WAL segment

 

For example:

ramp=# create table sample_tbl(a int, b varchar(255));

CREATE TABLE

ramp=# SELECT pg_relation_filepath('sample_tbl');

 pg_relation_filepath 

----------------------

 base/34740/706736

(1 row)

ramp=# SELECT datname, oid FROM pg_database WHERE datname = 'ramp';

 datname |  oid  

---------+-------

 ramp    | 34740

(1 row)

ramp=# SELECT relname, oid FROM pg_class WHERE relname = 'sample_tbl';

  relname   |  oid   

------------+--------

 sample_tbl | 706736

(1 row)

 

Physical replication deals with files and directories. It has no knowledge of what these files and directories represent. Physical replication is done at file system level or disk level.

Logical replication, on the other hand, deals with databases, tables, and DML operations. It is therefore possible in logical replication to replicate a certain set of tables only. Logical replication is done at the database cluster level.

 

7. Introduction to WAL

7.1 What is WAL and why is it required?

In PostgreSQL, all changes made by a transaction are first saved in a log file, and then the result of the transaction is sent to the initiating client. Data files themselves are not changed on every transaction. This is a standard mechanism to prevent data loss in case of circumstances like operating system crash, hardware failure, or PostgreSQL crash. This mechanism is called Write Ahead Logging, and the log file is called Write Ahead Log (WAL).

Each change that the transaction performs (INSERT, UPDATE, DELETE, COMMIT) is written in the log as a WAL record. WAL records are first written into an in-memory WAL buffer. When the transaction is committed the records are written into a WAL segment file on the disk.

A WAL record’s Log Sequence Number (LSN) represents the location/position where the record is saved in the log file. LSN is used as a unique id of the WAL record. Logically, a transaction log is a file whose size is 2^64 bytes. LSN is therefore a 64-bit number represented as two 32-bit hexadecimal numbers separated by a /. For example:

select pg_current_wal_lsn();

 pg_current_wal_lsn 

--------------------

 0/2BDBBD0

(1 row)

 

If there is a system crash, the database can recover committed transactions from the WAL. The recovery starts from the last REDO point or checkpoint. A checkpoint is a point in the transaction log at which all data files have been updated to match the information in the log. The process of saving the WAL records from the log file to the actual data files is called check-pointing.

Let’s consider a case where the database crashes after two transactions that perform one insert each and WAL is used for recovery.

1. Assume a checkpoint has been issued, which stores the location of the latest REDO point in the current WAL segment. This also flushes all dirty pages in the shared buffer pool to the disk. This action guarantees that WAL records before the REDO point are no longer needed for recovery, since all data has been flushed to the disk pages.

2. The first INSERT statement is issued. The table’s page is loaded from disk to the buffer pool.

3. A tuple is inserted into the loaded page.

4. The WAL record of this insert is saved into the WAL buffer at location LSN_1.

5. The page’s LSN is updated from LSN_0 to LSN_1, which identifies the WAL record for the last change to this page.

6. The first COMMIT statement is issued.

7. The WAL record of this commit action is written into the WAL buffer, and then all WAL records in the WAL buffer up to this page’s LSN are flushed to the WAL segment file.

8. For the second INSERT and commit, steps 2 to 7 are repeated.

What is WAL?

 

If there is an operating system crash, all the data on the shared buffer pool will be lost. However all modifications to the page have been written into the WAL segment files as history data. The following steps show how our database cluster can be restored back to the state immediately before the crash using WAL records. Nothing special needs to be done—PostgreSQL will automatically enter into recovery mode after restarting.

1. PostgreSQL reads the WAL record of the first INSERT statement from the appropriate WAL segment file.

2. PostgreSQL loads the table's page from the database cluster into the shared buffer pool.

3. PostgreSQL compares the WAL record's LSN (LSN_1) with the page LSN (LSN_0). Since LSN_1 is greater than LSN_0, the tuple in the WAL record is inserted into the page and the page's LSN is updated to LSN_1.

The remaining WAL records are replayed in the similar manner.

 

What is WAL? 2

 

7.2 Transaction Log and WAL Segment Files

The PostgreSQL transaction log is a virtual file with a capacity of 8-byte length. Physically, the log is divided into 16-MB files, each of which is called a WAL segment.

The WAL segment file name is a 24-digit number whose naming rule is as follows:

Transaction Log and WAL Segment Files

 

Assuming that current time line ID is 0x00000001 the first WAL segment file names will be:

00000001 00000000 0000000

00000001 00000000 0000001

00000001 00000000 0000002

……….

00000001 00000001 0000000

00000001 00000001 0000001

00000001 00000001 0000002

…………

00000001 FFFFFFFF FFFFFFFD

00000001 FFFFFFFF FFFFFFFE

00000001 FFFFFFFF FFFFFFFF

 

For example:

select pg_walfile_name('0/2BDBBD0');

     pg_walfile_name      

--------------------------

 000000010000000000000002

 

7.3 WAL Writer

The WAL writer is a background process that periodically checks the WAL buffer and writes any unwritten WAL records into the WAL segments. The WAL writer avoids bursts of IO activity and instead spans its process over times with little IO activity. The configuration parameter wal_writer_delay controls how often the WAL writer flushes the WAL, with a default value of 200 ms.

 

7.4 WAL Segment File Management

WAL segment files are stored in the pg_wal subdirectory. PostgreSQL switches to a new WAL segment file under the following conditions:

  • The WAL segment has been filled up.
  • The function pg_switch_wal has been issued.
  • archive_mode is enabled and the time set to archive_timeout has been exceeded. 

After they are switched out, WAL files can either be removed or recycled—i.e., renamed and reused for the future. The number of WAL files that the server would retain at any point in time depends on server configuration and server activity.

Whenever the checkpoint starts, PostgreSQL estimates and prepares the number of WAL segment files required for this checkpoint cycle. Such an estimate is made based on the numbers of files consumed in previous checkpoint cycles. They are counted from the segment that contains the prior REDO point, and the value is to be between min_wal_size (by default, 80 MB, or 5 files) and max_wal_size (1 GB, or 64 files). If a checkpoint starts, necessary files will be held and recycled, while the unnecessary ones will be removed.

An example is provided in the diagram below. Assume that there are six files before checkpoint starts, the previous REDO point is contained in the file WAL_3, and PostgreSQL estimates that five files are to be retained. In this case, WAL_1 will be renamed as WAL_7 for recycling, and WAL_2 will be removed.

WAL Segment File Management

 

7.5 WAL Example

Step 1: SELECT datname, oid FROM pg_database WHERE datname = 'postgres';

datname  |  oid  

----------+-------

postgres | 15709

Note the database OID, i.e. 15709

 

Step 2: SELECT oid,* from pg_tablespace;

oid  |  spcname   | spcowner | spcacl | spcoptions 

------+------------+----------+--------+------------

1663 | pg_default |       10 |        | 

1664 | pg_global  |       10 |        | 

Note the table space OID, i.e. 1663

 

Step 3: SELECT pg_current_wal_lsn();

pg_current_wal_lsn 

--------------------

0/1C420B8

Note the LSN, i.e. 0/1C420B8

 

Step 4: CREATE TABLE abc(a VARCHAR(10));

 

Step 5: SELECT pg_relation_filepath('abc');

pg_relation_filepath 

----------------------

base/15709/16384

Note the relation filename, base/15709/16384

 

Step 6: ./pg_waldump --path=/tmp/sd/pg_wal –start=0/1C420B8

using the start LSN noted in step 3.

Note that the WAL contains the instruction to create physical file

15709 → database postgres → noted in step 1

16384 → table abc → noted in step 5

 

rmgr

Len(rec/tot)

tx

lsn

prev

desc

XLOG   

30/ 30

   0

0/01C420B8

0/01C42080

NEXTOID 24576

Storage

 42/ 42

   0

0/01C420D8

0/01C420B8

CREATE base/15709/16384

Heap   

203/203

1216

0/01C42108

0/01C420D8

INSERT off 2, blkref #0: rel 1663/15709/1247 blk 0

Btree  

 64/ 64

1216

0/01C421D8

0/01C42108

INSERT_LEAF off 298, blkref #0: rel 1663/15709/2703 blk 2

Btree  

 64/ 64

1216

0/01C42218

0/01C421D8

INSERT_LEAF off 7, blkref #0: rel 1663/15709/2704 blk 5

Heap   

 80/ 80

1216

0/01C42258

0/01C42218

INSERT off 30, blkref #0: rel 1663/15709/2608 blk 9

Btree  

 72/ 72

1216

0/01C422A8

0/01C42258

INSERT_LEAF off 243, blkref #0: rel 1663/15709/2673 blk 51

Btree  

 72/ 72

1216

0/01C422F0

0/01C422A8

INSERT_LEAF off 170, blkref #0: rel 1663/15709/2674 blk 61

Heap   

203/203

1216

0/01C42338

0/01C422F0

INSERT off 6, blkref #0: rel 1663/15709/1247 blk 1

Btree  

64/64

1216

0/01C42408

0/01C42338

INSERT_LEAF off 298, blkref #0: rel 1663/15709/2703 blk 2

Btree  

 72/ 72

1216

0/01C42448

0/01C42408

INSERT_LEAF off 3, blkref #0: rel 1663/15709/2704 blk 1

Heap   

 80/ 80

1216

0/01C42490

0/01C42448

INSERT off 36, blkref #0: rel 1663/15709/2608 blk 9

Btree  

 72/ 72

1216

0/01C424E0

0/01C42490

INSERT_LEAF off 243, blkref #0: rel 1663/15709/2673 blk 51

Btree  

 72/ 72

1216

0/01C42528

0/01C424E0

INSERT_LEAF off 97, blkref #0: rel 1663/15709/2674 blk 57

Heap   

199/199

1216

0/01C42570

0/01C42528

INSERT off 2, blkref #0: rel 1663/15709/1259 blk 0

Btree  

 64/ 64

1216

0/01C42638

0/01C42570

INSERT_LEAF off 257, blkref #0: rel 1663/15709/2662 blk 2

Btree  

 64/ 64

1216

0/01C42678

0/01C42638

INSERT_LEAF off 8, blkref #0: rel 1663/15709/2663 blk 1

Btree  

 64/ 64

1216

0/01C426B8

0/01C42678

INSERT_LEAF off 217, blkref #0: rel 1663/15709/3455 blk 5

Heap   

171/171

1216

0/01C426F8

0/01C426B8

INSERT off 53, blkref #0: rel 1663/15709/1249 blk 16

Btree  

 64/ 64

1216

0/01C427A8

0/01C426F8

INSERT_LEAF off 185, blkref #0: rel 1663/15709/2658 blk 25

Btree  

 64/ 64

1216

0/01C427E8

0/01C427A8

INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16

Heap   

171/171

1216

0/01C42828

0/01C427E8

INSERT off 54, blkref #0: rel 1663/15709/1249 blk 16

Btree  

 72/ 72

1216

0/01C428D8

0/01C42828

INSERT_LEAF off 186, blkref #0: rel 1663/15709/2658 blk 25

Btree  

 64/ 64

1216

0/01C42920

0/01C428D8

INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16

Heap   

171/171

1216

0/01C42960

0/01C42920

INSERT off 55, blkref #0: rel 1663/15709/1249 blk 16

Btree  

 72/ 72

1216

0/01C42A10

0/01C42960

INSERT_LEAF off 187, blkref #0: rel 1663/15709/2658 blk 25

Btree  

 64/ 64

1216

0/01C42A58

0/01C42A10

INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16

Heap   

171/171

1216

0/01C42A98

0/01C42A58

INSERT off 1, blkref #0: rel 1663/15709/1249 blk 17

Btree  

 72/ 72

1216

0/01C42B48

0/01C42A98

INSERT_LEAF off 186, blkref #0: rel 1663/15709/2658 blk 25

Btree  

 64/ 64

1216

0/01C42B90

0/01C42B48

INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16

Heap   

171/171

1216

0/01C42BD0

0/01C42B90

INSERT off 3, blkref #0: rel 1663/15709/1249 blk 17

Btree  

 72/ 72

1216

0/01C42C80

0/01C42BD0

INSERT_LEAF off 188, blkref #0: rel 1663/15709/2658 blk 25

Btree  

 64/ 64

1216

0/01C42CC8

0/01C42C80

INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16

Heap   

171/171

1216

0/01C42D08

0/01C42CC8

INSERT off 5, blkref #0: rel 1663/15709/1249 blk 17

Btree  

 72/ 72

1216

0/01C42DB8

0/01C42D08

INSERT_LEAF off 186, blkref #0: rel 1663/15709/2658 blk 25

Btree  

 64/ 64

1216

0/01C42E00

0/01C42DB8

INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16

Heap   

171/171

1216

0/01C42E40

0/01C42E00

INSERT off 30, blkref #0: rel 1663/15709/1249 blk 32

Btree  

 72/ 72

1216

0/01C42EF0

0/01C42E40

INSERT_LEAF off 189, blkref #0: rel 1663/15709/2658 blk 25

Btree  

 64/ 64

1216

0/01C42F38

0/01C42EF0

INSERT_LEAF off 194, blkref #0: rel 1663/15709/2659 blk 16

Heap   

 80/ 80

1216

0/01C42F78

0/01C42F38

INSERT off 25, blkref #0: rel 1663/15709/2608 blk 11

Btree  

 72/ 72

1216

0/01C42FC8

0/01C42F78

INSERT_LEAF off 131, blkref #0: rel 1663/15709/2673 blk 44

Btree  

 72/ 72

1216

0/01C43010

0/01C42FC8

INSERT_LEAF off 66, blkref #0: rel 1663/15709/2674 blk 46

Standby

 42/ 42

1216

0/01C43058

0/01C43010

LOCK xid 1216 db 15709 rel 16384 

Txn    

405/405

1216

0/01C43088

0/01C43058

COMMIT 2019-03-04 07:42:23.165514 EST;... snapshot 2608 relcache 16384

Standby

 50/ 50

   0

0/01C43220

0/01C43088

RUNNING_XACTS nextXid 1217 latestCompletedXid 1216 oldestRunningXid 1217

 

Step 7: SELECT pg_current_wal_lsn();

 pg_current_wal_lsn 

--------------------

 0/1C43258

(1 row)

 

Step 8: INSERT INTO abc VALUES('pkn');

 

Step 9: ./pg_waldump --path=/tmp/sd/pg_wal --start=0/1C43258

using the start LSN from step 7.

1663 → pg_default tablespace → noted in step 2

15709 → database postgres → noted in step 1

16384 → table abc → noted in step 5

rmgr

Len (rec/tot)

tx

lsn

prev

desc

Heap

59/59

1217

0/01C43258

0/01C43220

INSERT+INIT off 1, blkref #0: rel 1663/15709/16384 blk 0

Transaction

34/34

1217

0/01C43298

0/01C43258

COMMIT 2019-03-04 07:43:45.887511 EST

Standby

54/54

0

0/01C432C0

0/01C43298

RUNNING_XACTS nextXid 1218 latestCompletedXid 1216 oldestRunningXid 1217; 1 xacts: 1217

 

 

Step 10: SELECT pg_current_wal_lsn();

 pg_current_wal_lsn 

--------------------

 0/1C432F8

(1 row)

 

Step 11: INSERT INTO abc VALUES('ujy');

 

Step 12: ./pg_waldump --path=/tmp/sd/pg_wal –start=0/1C432F8

using the start LSN fron step 10.

rmgr

Len (rec/tot)

tx

lsn

prev

desc

Heap       

59/59

1218

0/01C432F8

0/01C432C0

INSERT off 2, blkref #0: rel 1663/15709/16384 blk 0

Transaction

34/34

1218

0/01C43338

0/01C432F8

COMMIT 2019-03-04 07:44:25.449151 EST

Standby    

50/50

   0

0/01C43360

0/01C43338

RUNNING_XACTS nextXid 1219 latestCompletedXid 1218 oldestRunningXid 1219

 

Step 13: Check the actual tuples in the WAL segment files.

---------+---------------------------------------------------+----------------+

 Offset  | Hex Bytes                                         | ASCII chars    |

---------+---------------------------------------------------+----------------+

00000060 | 3b 00 00 00 c3 04 00 00  28 00 40 02 00 00 00 00  |;.......(.@.....|

00000070 | 00 0a 00 00 ec 28 75 6e  00 20 0a 00 7f 06 00 00  |.....(un. ......|

00000080 | 5d 3d 00 00 00 40 00 00  00 00 00 00 ff 03 01 00  |]=...@..........|

00000090 | 02 08 18 00 09 70 6b 6e  03 00 00 00 00 00 00 00  |.....pkn........|

000000a0 | 22 00 00 00 c3 04 00 00  60 00 40 02 00 00 00 00  |".......`.@.....|

000000b0 | 00 01 00 00 dd 4c 87 04  ff 08 e4 73 44 e7 41 26  |.....L.....sD.A&|

000000c0 | 02 00 00 00 00 00 00 00  32 00 00 00 00 00 00 00  |........2.......|

000000d0 | a0 00 40 02 00 00 00 00  10 08 00 00 9e 01 36 88  |..@...........6.|

000000e0 | ff 18 00 00 00 00 00 00  00 00 00 03 00 00 c4 04  |................|

000000f0 | 00 00 c4 04 00 00 c3 04  00 00 00 00 00 00 00 00  |................|



00000100 | 3b 00 00 00 c4 04 00 00  c8 00 40 02 00 00 00 00  |;.........@.....|

00000110 | 00 0a 00 00 33 df b4 71  00 20 0a 00 7f 06 00 00  |....3..q. ......|

00000120 | 5d 3d 00 00 00 40 00 00  00 00 00 00 ff 03 01 00  |]=...@..........|

00000130 | 02 08 18 00 09 75 6a 79  04 00 00 00 00 00 00 00  |.....ujy........|

00000140 | 22 00 00 00 c4 04 00 00  00 01 40 02 00 00 00 00  |".........@.....|

00000150 | 00 01 00 00 96 2e 96 a6  ff 08 d8 f3 79 ed 41 26  |............y.A&|

00000160 | 02 00 00 00 00 00 00 00  32 00 00 00 00 00 00 00  |........2.......|

00000170 | 40 01 40 02 00 00 00 00  10 08 00 00 eb 6b 95 36  |@.@..........k.6|

00000180 | ff 18 00 00 00 00 00 00  00 00 00 03 00 00 c5 04  |................|

00000190 | 00 00 c5 04 00 00 c4 04  00 00 00 00 00 00 00 00  |................|

000001a0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

 

 

7.6 Overview of Replication Options based on WAL

Continuous WAL Archiving

Copying WAL files as they are generated into any location other than the pg_wall subdirectory for the purpose of archiving them is called WAL archiving. PostgreSQL will invoke a script provided by the user for archiving each time a WAL file is generated. The script can use the scp command to copy the file to one or more locations. The location can be an NFS mount. Once archived the WAL segment files can be used to recover the database at any specified point in time.

Log Shipping Based Replication — File Level

Copying log files to another PostgreSQL server for the purpose of creating another standby server by replaying WAL files is called Log Shipping. This standby server is configured to be in recovery mode, and its sole purpose is to apply any new WAL files as they arrive. This second server (also termed as standby) then becomes a warm backup of the primary PostgreSQL server. The standby can also be configured to be a read replica, where it can also serve read-only queries. This is called a hot standby.

Log Shipping Based Replication — Block Level

Streaming replication improves the log shipping process. Instead of waiting for the WAL switch, the records are sent as they are generated, thus reducing replication delay. Another improvement is that the standby server will connect to the primary server over the network using a replication protocol. The primary server can then send WAL records directly over this connection without having to rely on scripts provided by the end user.

How long should the primary server retain WAL segment files?

If there aren’t any streaming replication clients, the server can discard/recycle the WAL segment file once the archive script reports success, unless they are not required for crash recovery.

The presence of standby clients, though, creates a problem: the server needs to keep WAL files around for as long as the slowest standby needs them. If, for example, the standby was taken down for a while, then comes back online and asks the primary for a WAL file that the primary no longer has, the replication fails with an error similar to:

ERROR: requested WAL segment 00000001000000010000002D has already been removed

The primary server should therefore keep track of how far behind the standby is, and not delete/recycle WAL files that any standbys still need. This feature is provided through replication slots.

Each replication slot has a name which is used to identify the slot. Each slot is associated with:

(a) The oldest WAL segment file required by the consumer of the slot. WAL segment files more recent than this are not deleted/recycled during checkpoints.

(b) The oldest transaction ID required to be retained by the consumer of the slot. Rows needed by any transactions more recent than this are not deleted by vacuum.

 

8. Log Shipping Based Replication

8.1 Physical Streaming Replication

Physical Streaming Replication

In streaming replication the standby server connects to the primary server and receives WAL records using a replication protocol. This provides two advantages: 

1. The standby server does not need to wait for the WAL file to fill up, which improves replication lag.

2. The dependency on the user-provided script and an intermediate shared storage between the servers is removed.

 

8.2 WAL Sender and WAL Receiver

A process called WAL receiver, running on the standby server, uses the connection details provided in the primary_conninfo parameter of recovery.conf, connects to the primary server using a TCP/IP connection. In the primary server another process called WAL sender is in charge of sending the WAL records to the standby server as they are generated. The WAL receiver saves the WAL records in WAL as if they were generated by client activity of locally connected clients. Once the WAL records reach the WAL segment files the standby server constantly keeps replaying the WAL so that standby and primary are up to date.

8.3 WAL Streaming Protocol Details

WAL Streaming Protocol Details

 

8.4 Setup

The setup consists of two CentOS 7 machines connected via LAN on which PostgreSQL version 10.7 is installed.

8.5 Configuring PostgreSQL Replication using WAL Streaming

 

Step 1: Disable and stop firewall on both the machines:

sudo firewall-cmd --state

sudo systemctl stop firewalld

sudo systemctl disable firewalld

sudo systemctl mask --now firewalld

 

Step 2: On the primary server, allow replication connections and connections from the same network. Modify pg_hba.conf:

Local  all           all                       md5

host   all           all   172.16.214.167/24   md5

host   all           all   ::1/128             md5

local  replication   all                       md5

host   replication   all   172.16.214.167/24   md5

host   replication   all   ::1/128             md5

 

Step 3: On the primary server, edit postgresql.conf to modify the following parameters:

max_wal_senders = 10

wal_level = replica

max_replication_slots = 10

synchronous_commit = on

synchronous_standby_names = '*'

listen_addresses = '*'

 

Step 4: Start the primary server: 

./postgres -D ../pr_data -p 5432

 

Step 5: Take base backup to bootstrap the standby server: 

./pg_basebackup 

--pgdata=/tmp/sb_data/ 

--format=p 

--write-recovery-conf 

--checkpoint=fast 

--label=mffb 

--progress 

--verbose 

--host=172.16.214.167 

--port=5432 

--username=postgres

 

Step 6: Check the base backup label file:

START WAL LOCATION: 0/2000028 (file 000000010000000000000002)

CHECKPOINT LOCATION: 0/2000060

BACKUP METHOD: streamed

BACKUP FROM: master

START TIME: 2019-02-24 05:25:30 EST

LABEL: mffb

 

Step 7: In the base backup, add the following line in the recovery.conf:

primary_slot_name = 'node_a_slot'

 

Step 8: Check the /tmp/sb_data/recovery.conf file

standby_mode = 'on'

primary_conninfo = 'user=enterprisedb 

password=abc123 

host=172.16.214.167 

port=5432 

sslmode=prefer 

sslcompression=1 

krbsrvname=postgres 

target_session_attrs=any'

primary_slot_name = 'node_a_slot'

 

Step 9: Connect to the primary server and issue this command:

edb=# SELECT * FROM pg_create_physical_replication_slot('node_a_slot');

      slot_name  | xlog_position 

    -------------+---------------

     node_a_slot | 

    (1 row)



edb=# SELECT slot_name, slot_type, active FROM pg_replication_slots;

      slot_name  | slot_type | active 

    -------------+-----------+--------

     node_a_slot | physical  | f

    (1 row)

 

Step 10: Transfer the base backup to the standby server: 

scp /tmp/sb_data.tar.gz abbas@172.16.214.166:/tmp

sudo mv /tmp/sb_data /opt/PostgreSQL/10/

sudo chown postgres:postgres /opt/PostgreSQL/10/sb_data/

sudo chown -R postgres:postgres /opt/PostgreSQL/10/sb_data/

sudo chmod 700 /opt/PostgreSQL/10/sb_data/

 

Step 11: Start the standby server:

./postgres -D ../sb_data/ -p 5432

 

The primary will show this in the log:

LOG:  standby "walreceiver" is now a synchronous standby with priority 1

 

The standby will show:

LOG:  database system was interrupted; last known up at 2018-10-24 15:49:55

LOG:  entering standby mode

LOG:  redo starts at 0/3000028

LOG:  consistent recovery state reached at 0/30000F8

LOG:  started streaming WAL from primary at 0/4000000 on timeline 1

 

Step 12: Connect to primary server and issue some simple commands:

-bash-4.2$ ./edb-psql -p 5432 edb

Password: 

psql.bin (10.7)

Type "help" for help.



create table abc(a int, b varchar(250));

insert into abc values(1,'One');

insert into abc values(2,'Two');

insert into abc values(3,'Three');

 

Step 13: Check the data on the replica:

./psql -p 5432 -U postgres postgres

Password for user postgres: 

psql.bin (10.7)

Type "help" for help.

postgres=# select * from abc;

a |   b   

---+-------

1 | One

2 | Two

3 | Three

(3 rows)

 

8.6 Steps to perform Failover

Step 1: Crash the primary server

Step 2: Promote the standby server by running the following command on the standby server:

./pg_ctl promote -D ../sb_data/

server promoting

 

Step 3: Connect to the promoted standby server and insert a row:

-bash-4.2$ ./edb-psql -p 5432 edb

Password: 

psql.bin (10.7)

Type "help" for help.



edb=# insert into abc values(4,'Four');

The fact that this insert works fine means that the standby, which otherwise is a read only server, has been promoted as the new primary server.

In this article we have described how PostgreSQL can be configured for failover using physical streaming replication. All the necessary background information is also provided.

Abbas ButtSenior Architect