Contributed by Bobby Bissett
High availability means keeping the enterprise’s critical data infrastructure running well with virtually no downtime, and ensuring the database infrastructure’s ability to keep running in the event of failure. Enter EDB Postgres™ Failover Manager, the high availability solution from EnterpriseDB® (EDB™) that was recently enhanced with controlled switchover and switchback and new options for customized configurations. (Read the release.)
EDB Failover Manager provides highly available, fault tolerant database clusters built using PostgreSQL streaming replication to reduce downtime and keep data available when a main database fails. EDB Failover Manager provides the cluster monitoring, failure detection, and failover procedures that can be integrated into a variety of 9s-based high availability solutions.
For EDB customers already using EDB Failover Manager, taking advantage of new features means upgrading. With the introduction of a new upgrade utility in EFM 2.1, making use of the new features is now more of a turnkey process.. What follows is a summary of some key steps in the upgrade process and what to expect, such as changes that have been made to configuration files and how to upgrade your files to 2.1 versions. We will assume the default cluster name ‘efm’ here, so the files that are used are efm.properties and efm.nodes in the /etc/efm-2.1M/font> directory.
(For information on how to use other cluster names see section 4.3 of the user’s guide.)
We will discuss the changes to the two configuration files below. Following that will be an example of running the new upgrade utility to make the changes automatically.
The efm.nodes File
The format and information in the .nodes file has not changed from Failover Manager version 2.0 to 2.1. The only difference is an extra line of comment text at the top of the efm.nodes.in template file:
# List of node address:port combinations separated by whitespace.
# The list should include at least the membership coordinator's address.
New in 2.1, a “membership coordinator” makes it easier to add existing nodes to a running cluster. If there are, for example, already five nodes in a cluster, you don’t need to add all five addresses to the file when starting a new agent. Only one address is needed, and that address is available from the efm cluster-status <cluster name> output (generally the first node started).
Section 3.2.2 of the Failover Manager user’s guide describes the efm.nodes file, and a subsequent blog will discuss startup in more detail. The upgrade utility will create a 2.1 efm.nodes file for you from the file your deployment currently utilizes but you can also just copy the file as-is to your 2.1 installation, making sure the permission and ownership match the efm.nodes.in the template file.
The efm.properties File
This section discusses changes to the properties used by Failover Manager at startup. For each property, description text in the file and the user documentation gives more information. There are some properties that didn’t change, but they can now be used in different ways:
- user.email – This property now supports multiple email addresses. It is optional if the script.notification property (described below) is being used.
- sudo.command – The property has not changed, but the scripts used by Failover Manager to invoke functions as either root or as database owner have changed. This can simplify permissions management for users that want to use a 3rd-party product instead of sudo.
- script.fence and script.post.promotion – These are unchanged, but now accept more information. For both, you can specify if you want the addresses of the failed node and/or new master node passed to the script.
The following two properties―jgroups.max.tries and jgroups.timeout―were removed. They were replaced by:
- node.timeout. The value of node.timeout is the total value of the previous values. The default is 50 seconds. This is how long agents will wait to declare that another agent/node has disappeared from the cluster.
The following properties have been added:
- db.service.name – This should be set if you are running your database servers as Linux services. For instance, if set to “ppas-9.5” on RHEL 7, Failover Manager will use “systemctl restart ppas-9.5” to restart a database rather than pg_ctl. If this is set, the db.bin property is optional.
- script.notification – Instead of (or in addition to) Failover Manager sending email notifications, a user-supplied script can be used instead in order to plug into other systems besides SMTP. The script will be called with two parameters: the subject and body of a notification. If this is set, the user.email property becomes optional.
- auto.allow.hosts – If set to true, this allows the user to avoid authorizing new nodes before they join the cluster, making startup faster for static clusters. The default is false.
- promotable and minimum.standbys – These properties can be used separately or together to make sure that a Failover Manager cluster does not promote more standbys than desired. The defaults are true and 0.
- recovery.check.period – While a standby is being promoted, this property controls how many seconds apart to check if the database has come out of recovery. The default is two seconds.
- auto.resume.period – An agent will be in IDLE state after a database failure. If this property is set to a non-zero value, Failover Manager will automatically attempt to resume monitoring every <value> seconds. The default is 0, meaning the efm resume <cluster name> command is needed to resume monitoring after the database has been restarted.
- script.resumed – A script that can be run whenever an IDLE agent resumes monitoring its local database.
- jvm.options – Options that are passed into the Failover Manager agent at startup. The default value sets the maximum memory for the agent to 32 MB.
The above lists the changes to the individual properties. They have also been regrouped within the file to help make related properties easier to understand. The template file efm.properties.in has the new order, and the upgrade utility (below) will use this template when migrating your older file to the new version.
The Upgrade Utility
The efm script in Failover Manager 2.1 includes an upgrade feature that can be used to quickly migrate your 2.0 configuration files into your new installation. Invoked with a cluster name, it will look for <clustername>.properties and <clustername>.nodes in the /etc/efm-2.0/ directory. It will convert their information into 2.1 files in /etc/efm-2.1 (any existing files will be renamed to include a timestamp).
The following example shows the tool run with the default ‘efm’ cluster name:
[root@FOUR efm-2.1]}> /usr/efm-2.1/bin/efm upgrade-conf efm
Processing efm.properties file.
Setting new property node.timeout to 50 (sec) based on existing timeout 5000 (ms) and max tries 10.
Processing efm.nodes file.
Upgrade of files is finished. Please ensure that the new file permissions match those of the template files before starting EFM.
The db.service.name property should be set before starting a non-witness agent.
From the output, you can see that the old jgroups.* property values were used to set the new node.timeout value. Unless you are running your database servers through pg_ctl, you should edit the file to set the db.service.name property. All other new properties will be set to appropriate default values.
In future blogs with screencasts, we will cover all of the 2.1 cluster properties in more detail, along with improvements to starting up a Failover Manager cluster.
Bobby Bissett is a Cloud Architect at EnterpriseDB.