What Does "Database High Availability" Really Mean?

Vibhor Kumar April 7, 2020

 

For many of our customers, High Availability is a key concern. Their architects spend a lot of time in designing and planning for high availability of applications and databases. High availability is important for business continuity. A short downtime can lead to loss of business, therefore this topic needs to be addressed and that leads me to write this blog.

If you Google for High availability, you will find many definitions. One definition from Wikipedia is given below:

High availability (HA) is a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.

Key Principles of High Availability

The following are the key principles of High Availability:

  1. Eliminate any single point of failure: Adding redundancy, so that the failure of any one part of the system does not lead to the collapse of the entire system.
  2. Reliable crossover: In a redundant system, the crossover point itself becomes a single point of failure. Fault-tolerant systems must provide a reliable crossover or automatic switchover mechanism to avoid failure.
  3. Detection of failures: If the above two principles are proactively monitored, then a user may never see a system failure. 

EDB Postgres has building blocks for covering all of the above key principles.

 

  • Elimination of single points of failover - Postgres supports the following types of physical standbys:
    • Cold standby - A backup server that has backups and all necessary WAL files for recovery. This system by definition is not up and running. However, the system can be made available if needed. Mainly we use backup servers and WAL files for creating a new PostgreSQL node as part of disaster recovery.
    • Warm Standby - In Warm Standby mode, Postgres runs in recovery mode and receives the updates using archived log files or using log shipping replication of Postgres. In this mode, Postgres is not accepting connections or queries.
    • Hot Standby - In Hot Standby mode, Postgres runs in recovery mode and receives the updates using archived log files or using log shipping replication.  In recovery mode, Postgres supports connections and read-only queries.

Any of the above can help in eliminating single points of failover. However, depending on the agreed level of performance/uptime, users can choose any one of the above. The most popular standby mode after Postgres 9.0 is Hot Standby.

  • Reliable crossover - For a reliable crossover, i.e., switching between master and standby(s) node(s), EDB provides a technology called EDB Postgres Failover Manager (EFM). This technology enables automatic failover of the Postgres master node to a Standby node in case of a software or hardware failure on the Master. EFM uses JGroups, which provides a reliable, distributed, and redundant infrastructure without a single point of failure. 
  • Detection of failures - EDB Postgres Failover Manager continuously monitors the server and detects failures. It also executes the failover from the Master to one of the Replicas in order to make the system available for accepting database connections and executing queries. Properly configured, EFM can detect failures, and execute a failover within a few seconds.

Combining all the above can help in achieving High Availability of EDB Postgres within a data center or across data centers. If you are a cloud user, you can have High Availability within a region (across multiple zones) or across the regions (using a backplane network supported by the cloud vendors). For a detailed walkthrough of questions you need to ask when designing Highly available databases, watch our on-demand webinar.

 

PostgreSQL Database Uptime and Availability

Uptime and availability are generally used as synonymous. To achieve High Availability and maintain the agreed uptime, architects make sure to reduce the outages/downtime.
Service outages come in two main flavors: 

  1. Planned outages
  2. Unplanned outages

Some people refer to them as Scheduled and Unscheduled downtime.

  • Planned outage/Scheduled downtime - Planned outage/scheduled downtime is a result of maintenance activities, which disrupt system operation and usually cannot be avoided. It might include patches to system software that require a reboot or database restart. In general Planned outage is a result of some logical, management-initiated event.
  • Unplanned outage/Unscheduled downtime - Unplanned Outage/unscheduled downtime is the result of downtime events due to some physical failures/events, such as hardware or software failure or environmental anomaly. For example, power outages, failed CPU or RAM components (or possibly other hardware components failure), network failure, security breaches, or various applications, middleware, and operating system failures result in Unplanned outage/Unscheduled downtime.

In the above outages/downtimes, the EDB Postgres Failover manager can help in minimizing the downtime. For planned outage/Scheduled downtime, a user/DBA can first patch all the standby(s) and use EDB Postgres Failover Manager perform switchover before patching the master (primary) node.

For unplanned outage/unscheduled downtime, EDB Postgres Failover Manager can detect failures and perform the failover to the appropriate standby, and make it the new master, which can then accept read/write connections and provide database services to the  application. EDB Postgres Failover Manager also makes sure that the old master/primary doesn’t come back (after failover) to avoid a split-brain situation.  

With EDB Postgres Failover Manager, if an architect wants to reduce the unavailability of their applications, they can also leverage multiple hosts connections of JDBC driver or libpq as given. 

postgresql://host1:123,host2:456/somedb?target_session_attrs=read-write&application_name=myapp

The above will make the master/primary failover of Postgres transparent to the application.

 

Availability Calculation

Availability is usually calculated/expressed as a percentage of uptime in a given year based on the service level agreements. Some companies exclude the planned outage/scheduled downtime based on their agreements with customers on the availability of their services.

The below table shows the translation of five Nines (9) from a given availability percentage to the corresponding amount of time a system would be unavailable.

Availability %

Downtime per year

Downtime per month

Downtime per week

Downtime per day

99.99% ("four nines")

52.60 minutes

4.38 minutes

1.01 minutes

8.64 seconds

99.995% ("four and a half nines")

26.30 minutes

2.19 minutes

30.24 seconds

4.32 seconds

99.999% ("five nines")

5.26 minutes

26.30 seconds

6.05 seconds

864.00 milliseconds

 

Based on the use cases and service level agreements, EDB has been able to help our customers to achieve five 9s with EDB Postgres.

Want to learn more how to operate Postgres at scale, with flexible deployment options? Check out the EDB Postgres Platform

 

 

 

Vibhor KumarChief Performance Architect

Vibhor Kumar is Chief Performance Architect with 12+ years of leadership experience in designing innovative business solutions for customers and leads the performance engineering team at EnterpriseDB.