EDB Failover Manager

High availability for Postgres

EDB Failover Manager

EDB Failover Manager

EDB Failover Manager (EFM) provides the high availability infrastructure for EDB Postgres. EFM monitors the members of a Postgres cluster, identifies and verifies database failures quickly and reliably, and, if needed, promotes a standby node to become the cluster master and issues alerts.


EDB Failover Manager Architecture

EDB Failover Manager Architecture Diagram

High Availability for EDB Postgres

PostgreSQL and EDB Advanced Server achieve high availability through shared disk clusters or streaming replication clusters. Shared disk clusters rely on custom hardware or underlying operating system capabilities. Streaming replication clusters leverage key Postgres capabilities to create highly redundant architectures of masters with local or distributed replicas.

EDB Failover Manager (EFM) provides the cluster health monitoring, node/database failure detection, and automatic failover mechanisms needed for integration into a variety of stringent 9’s high availability solutions.

Features include:

  • Application-transparent automatic failover from a downed master database to a promoted streaming replica standby database requiring no changes to your application code. The streaming replica may be in either warm or hot standby modes.
  • A witness node architecture to prevent false failovers and ‘split brain’ scenarios, in which two nodes in the cluster both ‘think’ they are the master, a situation that could otherwise lead to data corruption.
  • A variety of health checks guard against unnecessary or false failovers.
  • Multiple configurable failover detection and failover options, including manual failover by an administrator.
  • Configurable 'fencing' operations (by virtual IP or load balancers) ensure that failed nodes don't accidentally rejoin the HA cluster causing possible loss of data.
  • Automatic email notifications to alert the administrator of changes in cluster health conditions and keep the DBA informed at all stages of the failover process.
  • Controlled switchover and switchback to support maintenance operations and to test disaster recovery procedures.


EFM Agents, EFM Clusters, and EFM Witness

EFM deploys EFM agents on every Postgres cluster node (master and replicas). The agents communicate with each other via TCP/IP and with the database server on ‘their’ cluster node via JDBC. They regularly check if the master database is alive; in case of failure other agents in the cluster verify that the database is down or unreachable, before the cluster decides that another replica should be promoted to become the new master of the cluster, and that all existing replicas need to align with the new master.

If a cluster only consists of two nodes, then a lightweight witness node is installed on another server to avoid false failover situations.

VIP, Custom Fencing Scripts, and Integration with pgPool

By default, EFM uses a virtual IP address assignment mechanism to identify which member of the cluster is currently the master, and can receive read/write transactions. This mechanism, just like many others for fencing, post-promotion, database failure, isolation of the master, or notifications, are fully customizable.

The scripting mechanisms can be used to integrate EFM with pgPool and achieve a highly available and read-scalable solution.