In this blog post, we will walk you through the concept of building a high availability architecture with Postgres, based on the presentation given by Mike Sijmons of Nibble IT at Postgres Build 2020. The session discussed the end-to-end workflow of setting up a high availability system—from requirements gathering to system design to tooling.
Users generally expect to meet a common set of requirements from the Postgres ecosystem. It's important to note that the major requirements hinge on the application behavior during downtime or failover. No reconfiguration nor data loss should be expected after accidental failures. Also, monitoring and troubleshooting of an unexpected shutdown should be automated. That means, to ensure high availability, the monitoring system should be capable of making quick, smart decisions without any human intervention. Preventing errors is always preferred over solving them; therefore, live monitoring checks in the running database are always encouraged to foresee the unexpected events.
There are some other requirements, such as data auditing, encryption, and auto-deployment to multiple environments like Test, Development, UAT, and Production—whereby, the post-build cleanup tasks (i.e. by using the Autovacuum Daemon from Postgres stack) and log management are part of automatic maintenance which is often seen as an important requirement.
Now moving towards the system design, it is recommended to consider the following points:
Postgres Streaming Replication
First, there is a pair of primary and secondary databases in one data center and two secondary databases placed in another data center. And to support a zero RPO, (Recovery Point Objective), synchronize replication is used.
Most importantly, some extra tooling to manage the HA cluster is required since Postgres itself cannot manage the accidental failures automatically. Therefore, the failover management is being overseen using a tool called Repmanager. Where every HA cluster has an agent called Repmanager Daemon placed on it.
Witness and Backup Server
The witness server is required to configure and monitor the failover and unexpected shutdown even more accurately. It is recommended to host this server somewhere externally in the third data center or in the cloud. In addition to these servers, BARMAN is used to configure the backup server, which is responsible for tasks such as data archives and backups.
There are a few more underline components like HA-proxy for routing and auto reconfiguration and resource monitoring tools, i.e. Grafana and Prometheus for the performance insight and metrics.
How Postgres High Availability Works
Nibble IT’s session included a demonstration of how this overall setup of high availability runs. The demo showed how we can manage different services from the operator node window which comes along with the Postgres HA setup. To show and explain the functionality, an intentional failover was performed from the operator menu, where all the status of all the servers and transactional data was shown.
In summary, to achieve high availability, the underlying design implementation is the key. It comes down to the use of reliable tools with dedicated functionality, scalable configurations, and automatic testing and deployment.
To learn more about true Postgres high availability and see the full demonstration, watch the session from Postgres Build 2020 on demand here!