The cluster properties file v4

Each node in a Failover Manager cluster has a properties file (by default, named efm.properties) that contains the properties of the individual node on which it resides. The Failover Manager installer creates a file template for the properties file named efm.properties.in in the /etc/edb/efm-4.<x> directory.

After completing the Failover Manager installation, make a working copy of the template before modifying the file contents:

# cp /etc/edb/efm-4.8/efm.properties.in /etc/edb/efm-4.8/efm.properties

After copying the template file, change the owner of the file to efm:

# chown efm:efm efm.properties
Note

By default, Failover Manager expects the cluster properties file to be named efm.properties. If you name the properties file something other than efm.properties, modify the service script or unit file to instruct Failover Manager to use a different name.

After creating the cluster properties file, add or modify configuration parameter values as required. For detailed information about each property, see Specifying cluster properties.

The property files are owned by root. The Failover Manager service script expects to find the files in the /etc/edb/efm-4.<x> directory. If you move the property file to another location, you must create a symbolic link that specifies the new location.

Note

All user scripts referenced in the properties file are invoked as the Failover Manager user.

Specifying cluster properties

You can use the properties listed in the cluster properties file to specify connection properties and behaviors for your Failover Manager cluster. Modifications to property settings are applied when Failover Manager starts. If you modify a property value, you must restart Failover Manager to apply the changes.

Property values are case sensitive. While Postgres uses quoted strings in parameter values, Failover Manager doesn't allow quoted strings in property values. For example, while you might specify an IP address in a Postgres configuration parameter as:

listen_addresses='192.168.2.47'

With Failover Manager, don't enclose the value in quotes:

bind.address=192.168.2.54:7800

Use the properties in the efm.properties file to specify connection, administrative, and operational details for Failover Manager.

Legends: In the following table:

  • A: Required on primary or standby node
  • W: Required on witness node
  • Y: Yes
Property nameAWDefault valueComments
db.userYYUsername for the database.
db.password.encryptedYYPassword encrypted using 'efm encrypt'.
db.portYYThis value must be same for all the agents.
db.databaseYYDatabase name.
db.service.ownerYOwner of $PGDATA dir for db.database.
db.service.nameRequired if running the database as a service.
db.binYDirectory containing the pg_controldata/pg_ctl commands such as '/usr/edb/asnn/bin'.
db.data.dirYSame as the output of query 'show data_directory;'
db.config.dirSame as the output of query 'show config_file;'. Should be specified if it is not same as db.data.dir.
jdbc.sslmodeYYdisableSee the note.
user.emailThis value must be same for all the agents; can be left blank if using a notification script.
from.email.efm@localhostLeave blank to use the default efm@localhost.
notification.levelYYINFOSee the list of notifications.
notification.text.prefix
script.notificationRequired if user.email property is not used; both parameters can be used together.
bind.addressYYExample: <ip_address>:<port>
external.addressExample: <ip_address/hostname>
admin.portYY7809Modify if the default port is already in use.
is.witnessYYSee description.
local.periodY10
local.timeoutY60
local.timeout.finalY10
remote.timeoutYY10
node.timeoutYY50This value must be same for all the agents.
encrypt.agent.messagesYYfalseThis value must be same for all the agents
enable.stop.clustertrueThis value must be same for all the agents. Available in Failover Manager 4.2 and later.
stop.isolated.primaryYtrueThis value must be same for all the agents.
stop.failed.primaryYtrue
primary.shutdown.as.failureYYfalse
update.physical.slots.periodY0
ping.server.ipYY8.8.8.8
ping.server.commandYY/bin/ping -q -c3 -w5
auto.allow.hostsYYfalse
stable.nodes.fileYYfalse
db.reuse.connection.countY0
auto.failoverYYtrue
auto.reconfigureYtrueThis value must be same for all the agents.
promotableYtrue
use.replay.tiebreakerYYtrueThis value must be same for all the agents.
standby.restart.delay0
application.nameSet to replace the application_name portion of the primary_conninfo entry with this property value before starting the original primary database as a standby.
restore.commandExample: restore.command=scp <db_service_owner>@%h: <archive_path>/%f %p
reconfigure.num.syncYfalseIf you are on Failover Manager 4.1, see reconfigure_num_sync_max to raise num_sync.
reconfigure.num.sync.maxAvailable in Failover Manager 4.1 and later.
reconfigure.sync.primaryYfalse
minimum.standbysYY0This value must be same for all the nodes.
priority.standbysAvailable in Failover Manager 4.2 and later.
recovery.check.periodY1
restart.connection.timeout60
auto.resume.periodY0
virtual.ip(see virtual.ip.single)Leave blank if you do not specify a VIP.
virtual.ip.interfaceRequired if you specify a VIP.
virtual.ip.prefixRequired if you specify a VIP.
virtual.ip.singleYYYesThis value must be same for all the nodes.
check.vip.before.promotionYYYes
pgpool.enablefalseAvailable in Failover Manager 4.1 and later.
pcp.userRequired if pgpool.enable is set to true. Available in Failover Manager 4.1 and later.
pcp.hostRequired if pgpool.enable is set to true, this value must be same for all the agents. Available in Failover Manager 4.1 and later.
pcp.portRequired if pgpool.enable is set to true, this value must be same for all the agents. Available in Failover Manager 4.1 and later.
pcp.pass.fileRequired if pgpool.enable is set to true. Available in Failover Manager 4.1 and later.
pgpool.binRequired if pgpool.enable is set to true. Available in Failover Manager 4.1 and later.
script.load.balancer.attachExample: script.load.balancer.attach= /<path>/<attach_script> %h %t
script.load.balancer.detachExample: script.load.balancer.detach= /<path>/<detach_script> %h %t
detach.on.agent.failuretrueSet to false if you want to keep a running primary database attached. Available in Failover Manager 4.2 and later.
script.fenceExample: script.fence= /<path>/<script_name> %p %f
script.post.promotionExample: script.post.promotion= /<path>/<script_name> %f %p
script.resumedExample: script.resumed= /<path>/<script_name>
script.db.failureExample: script.db.failure= /<path>/<script_name>
script.primary.isolatedExample: script.primary.isolated= /<path>/<script_name>
script.remote.pre.promotionExample: script.remote.pre.promotion= /<path>/<script_name> %p
script.remote.post.promotionExample: script.remote.post.promotion= /<path>/<script_name> %p
script.custom.monitorExample: script.custom.monitor= /<path>/<script_name>
custom.monitor.intervalRequired if a custom monitoring script is specified.
custom.monitor.timeoutRequired if a custom monitoring script is specified.
custom.monitor.safe.modeRequired if a custom monitoring script is specified.
sudo.commandYYsudo
sudo.user.commandYYsudo -u %u
lock.dirIf not specified, defaults to '/var/lock/efm-<version>'
log.dirIf not specified, defaults to '/var/log/efm-<version>'
syslog.hostlocalhost
syslog.port514
syslog.protocol
syslog.facilityUDP
file.log.enabledYYtrue
syslog.enabledYYfalse
jgroups.loglevelinfo
efm.loglevelinfo
jvm.options-Xmx128m

Cluster properties

Use the following properties to specify connection details for the Failover Manager cluster:

# The value for the password property should be the output from
# 'efm encrypt' -- do not include a cleartext password here. To
# prevent accidental sharing of passwords among clusters, the
# cluster name is incorporated into the encrypted password. If
# you change the cluster name (the name of this file), you must
# encrypt the password again with the new name.
# The db.port property must be the same for all nodes.
db.user=
db.password.encrypted=
db.port=
db.database=

The db.user specified must have enough privileges to invoke selected PostgreSQL commands on behalf of Failover Manager. For more information, see Prerequisites.

For information about encrypting the password for the database user, see Encrypting your database password.

Use the db.service.owner property to specify the name of the operating system user that owns the cluster that is being managed by Failover Manager. This property isn't required on a dedicated witness node.

# This property tells EFM which OS user owns the $PGDATA dir for
# the 'db.database'. By default, the owner is either 'postgres'
# for PostgreSQL or 'enterprisedb' for EDB Postgres Advanced
# Server. However, if you have configured your db to run as a
# different user, you will need to copy the /etc/sudoers.d/efm-XX
# conf file to grant the necessary permissions to your db owner.
#
# This username must have write permission to the
# 'db.data.dir' specified below.
db.service.owner=

Specify the name of the database service in the db.service.name property if you use the service or systemctl command when starting or stopping the service.

# Specify the proper service name in order to use service commands
# rather than pg_ctl to start/stop/restart a database. For example, if
# this property is set, then 'service <name> restart' or 'systemctl
# restart <name>'
# (depending on OS version) will be used to restart the database rather
# than pg_ctl.
# This property is required if running the database as a service.
db.service.name=

Use the same service control mechanism (pg_ctl, service, or systemctl) each time you start or stop the database service. If you use the pg_ctl program to control the service, specify the location of the pg_ctl program in the db.bin property.

# Specify the directory containing the pg_controldata/pg_ctl commands,
# for example:
# /usr/edb/as12/bin. Unless the db.service.name property is used, the
# pg_ctl command is used to start/stop/restart databases as needed
# after a failover or switchover. This property is required.
db.bin=

Use the db.data.dir property to specify the location to write a recovery file on the primary node of the cluster during promotion. This property is required on primary and standby nodes. It isn't required on a dedicated witness node.

# For database version 12 and up, this is the directory where a
# standby.signal file will exist for a standby node. For previous
# versions, this is the location of the db recovery.conf file on
# the node.
# After a failover, the recovery.conf files on remaining standbys are
# changed to point to the new primary db (a copy of the original is made
# first). On a primary node, a recovery.conf file will be written during
# failover and promotion to ensure that the primary node can not be
# restarted as the primary database.
# This corresponds to database environment variable PGDATA and should
# be same as the output of query 'show data_directory;' on respective
# database.
db.data.dir=

Use the db.data.dir property to specify the location to write a recovery file on the primary node of the cluster during promotion. This property is required on primary and standby nodes. It isn't required on a dedicated witness node.

# For database version 12 and up, this is the directory where a
# standby.signal file will exist for a standby node. For previous
# versions, this is the location of the db recovery.conf file on
# the node.
# After a failover, the recovery.conf files on remaining standbys are
# changed to point to the new primary db (a copy of the original is made
# first). On a primary node, a recovery.conf file will be written during
# failover and promotion to ensure that the primary node can not be
# restarted as the primary database.
# This corresponds to database environment variable PGDATA and should
# be same as the output of query 'show data_directory;' on respective
# database.
db.data.dir=

Use the db.config.dir property to specify the location of database configuration files if they aren't stored in the same directory as the recovery.conf or standby.signal file. This is the value specified by the config_file parameter directory of your EDB Postgres Advanced Server or PostgreSQL installation. This value is used as the location of the EDB Postgres Advanced Server data directory when stopping, starting, or restarting the database.

# Specify the location of database configuration files if they are
# not contained in the same location as the recovery.conf or
# standby.signal file. This is most likely the case for Debian
# installations. The location specified will be used as the -D value
# (the location of the data directory for the cluster) when calling
# pg_ctl to start or stop the database. If this property is blank,
# the db.data.dir location specified by the db.data.dir property will
# be used. This corresponds to the output of query 'show config_file;'
# on respective database.
db.config.dir=

For more information about database configuration files, visit the PostgreSQL website.

Use the jdbc.sslmode property to instruct Failover Manager to use SSL connections. By default, SSL is disabled.

# Use the jdbc.sslmode property to enable ssl for EFM
# connections. Setting this property to anything but 'disable'
# will force the agents to use 'ssl=true' for all JDBC database
# connections (to both local and remote databases).
# Valid values are:
#
# disable - Do not use ssl for connections.
# verify-ca - EFM will perform CA verification before allowing
# the certificate.
# require - Verification will not be performed on the server
# certificate.
jdbc.sslmode=disable
Note

If you set the value of jdbc.sslmode to verify-ca and you want to use Java trust store for certificate validation, you need to set the following value:

jdbc.properties=sslfactory=org.postgresql.ssl.DefaultJavaSSLFactory

For information about configuring and using SSL, see Secure TCP/IP Connections with SSL and Using SSL in the PostgreSQL documentation.

Use the user.email property to specify an email address (or multiple email addresses) to receive notifications sent by Failover Manager.

# Email address(es) for notifications. The value of this
# property must be the same across all agents. Multiple email
# addresses must be separated by space. If using a notification
# script instead, this property can be left blank.
user.email=

The from.email property specifies the value to use as the sender's address for email notifications from Failover Manager. You can:

  • Leave from.email blank to use the default value (efm@localhost).
  • Specify a custom value for the email address.
  • Specify a custom email address, using the %h placeholder to represent the name of the node host (for example, example@%h). The placeholder is replaced with the name of the host as returned by the Linux hostname utility.

For more information about notifications, see Notifications.

# Use the from.email property to specify the from email address that
# will be used for email notifications. Use the %h placeholder to
# represent the name of the node host (e.g. example@%h). The
# placeholder will be replaced with the name of the host as returned
# by the hostname command.
# Leave blank to use the default, efm@localhost.
from.email=

Use the notification.level property to specify the minimum severity level at which Failover Manager sends user notifications or when a notification script is called. For a complete list of notifications, see Notifications.

# Minimum severity level of notifications that will be sent by
# the agent. The minimum level also applies to the notification
# script (below). Valid values are INFO, WARNING, and SEVERE.
# A list of notifications is grouped by severity in the user's
# guide.
notification.level=INFO

Use the notification.text.prefix property to specify the text to add to the beginning of every notification.

# Text to add to the beginning of every notification. This could
# be used to help identify what the cluster is used for, the role
# of this node, etc. To use multiple lines, add a backslash \ to
# the end of a line of text. To include a newline use \n.
# Example:
# notification.text.prefix=Development cluster for Example dept.\n\
# Used by Dev and QA \
# See Example group for questions.
notification.text.prefix=

Use the script.notification property to specify the path to a user-supplied script that acts as a notification service. The script is passed a message subject and a message body. The script is invoked each time Failover Manager generates a user notification.

# Absolute path to script run for user notifications.
#
# This is an optional user-supplied script that can be used for
# notifications instead of email. This is required if not using
# email notifications. Either/both can be used. The script will
# be passed two parameters: the message subject and the message
# body.
script.notification=

The bind.address property specifies the IP address and port number of the agent on the current node of the Failover Manager cluster.

# This property specifies the ip address and port that jgroups
# will bind to on this node. The value is of the form
# <ip>:<port>.
# Note that the port specified here is used for communicating
# with other nodes, and is not the same as the admin.port below,
# used only to communicate with the local agent to send control
# signals.
# For example, <provide_your_ip_address_here>:7800
bind.address=

Use the external.address property to specify the IP address or hostname to use for communication with all other Failover Manager agents in a NAT environment.

# This is the ip address/hostname to be used for communication with all
# other Failover Manager agents. All traffic towards this address
# should be routed by the network to the bind.address of the node.
# The value is in the ip/hostname format only. This address will be
# used in scenarios where nodes are on different networks and broadcast
# an IP address other than the bind.address to the external world.
external.address=

Use the admin.port property to specify a port on which Failover Manager listens for administrative commands.

# This property controls the port binding of the administration
# server which is used for some commands (ie cluster-status). The
# default is 7809; you can modify this value if the port is
# already in use.
admin.port=7809

Set the is.witness property to true to indicate that the current node is a witness node. If is.witness is true, the local agent doesn't check to see if a local database is running.

# Specifies whether or not this is a witness node. Witness nodes
# do not have local databases running.
is.witness=

The EDB Postgres Advanced Server pg_is_in_recovery() function is a Boolean function that reports the recovery state of a database. The function returns true if the database is in recovery or false if the database isn't in recovery. When an agent starts, it connects to the local database and invokes the pg_is_in_recovery() function.

  • If the server responds true, the agent assumes the role of standby.
  • If the server responds false, the agent assumes the role of primary.
  • If there's no local database, the agent assumes an idle state.
Note

If is.witness is true, Failover Manager doesn't check the recovery state.

The following properties apply to the local server:

  • The local.period property specifies the number of seconds between attempts to contact the database server.
  • The local.timeout property specifies the number of seconds an agent waits for a positive response from the local database server.
  • The local.timeout.final property specifies the number of seconds an agent waits after the previous checks have failed to contact the database server on the current node. If a response isn't received from the database within the number of seconds specified by the local.timeout.final property, the database is assumed to have failed.

For example, given the default values of these properties, a check of the local database happens once every 10 seconds. If an attempt to contact the local database doesn't come back positive within 60 seconds, Failover Manager makes a final attempt to contact the database. If a response isn't received within 10 seconds, Failover Manager declares database failure and notifies the administrator listed in the user.email property. These properties aren't required on a dedicated witness node.

# These properties apply to the connection(s) EFM uses to monitor
# the local database. Every 'local.period' seconds, a database
# check is made in a background thread. If the main monitoring
# thread does not see that any checks were successful in
# 'local.timeout' seconds, then the main thread makes a final
# check with a timeout value specified by the
# 'local.timeout.final' value. All values are in seconds.
# Whether EFM uses single or multiple connections for database
# checks is controlled by the 'db.reuse.connection.count'
# property.
local.period=10
local.timeout=60
local.timeout.final=10

If necessary, modify these values to suit your business model.

Use the remote.timeout property to limit how many seconds an agent waits for a response from a remote agent or database. Agents only send messages to each other during cluster events. Examples include:

  • Attempting to connect to a remote database that may have failed and asking other agents if they can connect.
  • A primary agent requesting recovery settings from a standby agent as part of a switchover.
  • Telling nodes to prepare to shut down when stopping the Failover Manager cluster.
# Timeout for a call to check if a remote database is responsive.
# For example, this is how long a standby would wait for a
# DB ping request from itself and the witness to the primary DB
# before performing failover.
remote.timeout=10

Use the node.timeout property to specify the number of seconds for an agent to wait for a heartbeat from another node when determining if a node has failed.

# The total amount of time in seconds to wait before determining
# that a node has failed or been disconnected from this node.
#
# The value of this property must be the same across all agents.
node.timeout=50

!!! Summary/comparison of timeout properties

- The `local.*` properties are for failure detection of an agent's local database.
- The `node.timeout` property is for failure detection of other nodes.
- The `remote.timeout` property limits how long agents wait for responses from other agents.

Use the encrypt.agent.messages property to specify whether to encrypt the messages sent between agents.

# Set to true to encrypt messages that are sent between agents.
# This property must be the same on all agents or else the agents
# will not be able to connect.
encrypt.agent.messages=false

Use the enable.stop.cluster property to enable or disable the stop-cluster command. The command is a convenience in some environments but can cause issues when unintentionally invoked. In Eager Failover mode, the command results in stopping EDB Postgres Advanced Server without failover.

# Whether or not the 'efm stop-cluster <cluster name>' command is enabled.
# Set to false to disable the command, in which case all Failover
# Manager agents must be stopped individually. Note that stopping each
# agent separately will change the .nodes files on remaining agents
# unless stable.nodes.file is also true. This property value must
# be the same on all agents if set. The default is true if not set.
enable.stop.cluster=true

Use the stop.isolated.primary property to instruct Failover Manager to shut down the database if a primary agent detects that it's isolated. When true (the default), Failover Manager stops the database before invoking the script specified in the script.primary.isolated property.

# Shut down the database after a primary agent detects that it has
# been isolated from the majority of the efm cluster. If set to
# true, efm will stop the database before running the
# 'script.primary.isolated' script, if a script is specified.
stop.isolated.primary=true

Use the stop.failed.primary property to instruct Failover Manager to attempt to shut down a primary database if it can't reach the database. If true, Failover Manager runs the script specified in the script.db.failure property after attempting to shut down the database.

# Attempt to shut down a failed primary database after EFM can no
# longer connect to it. This can be used for added safety in the
# case a failover is caused by a failure of the network on the
# primary node.
# If specified, a 'script.db.failure' script is run after this attempt.
stop.failed.primary=true

Use the primary.shutdown.as.failure property to treat any shutdown of the Failover Manager agent on the primary node as a failure. If this property is set to true and the primary agent is shut down, the rest of the cluster treats the shutdown as a failure. This includes any proper shutdown of the agent such as a shutdown of the whole node. None of the timeout properties apply in this case: when the agent exits, the rest of the cluster is notified immediately. After the agent exits, the rest of the cluster performs checks that happen in the case of a primary agent failure. The checks include attempting to connect to the primary database, seeing if the VIP is reachable if used, and so on).

  • If the database is reached, a notification is sent informing you of the agent status.
  • If the database isn't reached, a failover occurs.
# Treat a primary agent shutdown as an agent failure. This can be set
# to true to treat a primary agent shutdown as a failure situation.
# Caution should be used when using this feature, as it could
# cause an unwanted promotion in the case of performing primary
# database maintenance.
# Please see the user's guide for more information.
primary.shutdown.as.failure=false

The primary.shutdown.as.failure property is meant to catch user error, rather than failures, such as the accidental shutdown of a primary node. The proper shutdown of a node can appear to the rest of the cluster as if a user has stopped the primary Failover Manager agent, for example to perform maintenance on the primary database. If you set the primary.shutdown.as.failure property to true, take care when performing maintenance.

To perform maintenance on the primary database when primary.shutdown.as.failure is true, stop the primary agent and wait to receive a notification that the primary agent has failed but the database is still running. Then, it is safe to stop the primary database. Alternatively, you can use the stop-cluster command to stop all of the agents without performing failure checks.

Use the update.physical.slots.period property to define the slot advance frequency for database version 12 and later. When update.physical.slots.period is set to a positive integer value, the primary agent reads the current restart_lsn of the physical replication slots after every update.physical.slots.period seconds and sends this information with its pg_current_wal_lsn and primary_slot_name (if it is set in the postgresql.conf file) to the standbys. The physical slots must already exist on the primary for the agent to find them. If physical slots do not already exist on the standbys, standby agents create the slots and then update restart_lsn parameter for these slots. A non-promotable standby doesn't create new slots but updates them if they exist.

Before updating the restart_lsn value of a slot, the agent checks to see if an xmin value has been set, which may happen if this was previously a primary node. If an xmin value has been set for the slot, the agent drops and recreates the slot before updating the restart_lsn value.

Note: all slot names, including one set on the current primary if desired, must be unique.

# Period in seconds between having the primary agent update promotable
# standbys with physical replication slot information so that
# the cluster will continue to use replication slots after a failover.
# Set to zero to turn off.
update.physical.slots.period=0

Use the ping.server.ip property to specify the IP address of a server that Failover Manager can use to confirm that network connectivity isn't a problem.

# This is the address of a well-known server that EFM can ping
# in an effort to determine network reachability issues. It
# might be the IP address of a nameserver within your corporate
# firewall or another server that *should* always be reachable
# via a 'ping' command from each of the EFM nodes.
#
# There are many reasons why this node might not be considered
# reachable: firewalls might be blocking the request, ICMP might
# be filtered out, etc.
#
# Do not use the IP address of any node in the EFM cluster
# (primary, standby, or witness) because this ping server is meant
# to provide an additional layer of information should the EFM
# nodes lose sight of each other.
#
# The installation default is Google's DNS server.
ping.server.ip=8.8.8.8

Use the ping.server.command property to specify the command used to test network connectivity.

# This command will be used to test the reachability of certain
# nodes.
#
# Do not include an IP address or hostname on the end of
# this command - it will be added dynamically at runtime with the
# values contained in 'virtual.ip' and 'ping.server.ip'.
#
# Make sure this command returns reasonably quickly - test it
# from a shell command line first to make sure it works properly.
ping.server.command=/bin/ping -q -c3 -w5

Use the auto.allow.hosts property to instruct the server to use the addresses specified in the .nodes file of the first nostarts to update the allowed host list. Enabling this property by setting auto.allow.hosts to true can simplify cluster startup.

# Have the first nostarts automatically add the addresses
# from its .nodes file to the allowed host list. This will make
# it faster to start the cluster when the initial set of hosts
# is already known.
auto.allow.hosts=false

Use the stable.nodes.file property to instruct the server not to rewrite the nodes file when a node joins or leaves the cluster. This property is most useful in clusters with IP addresses that don't change.

# When set to true, EFM will not rewrite the .nodes file whenever
# new nodes join or leave the cluster. This can help starting a
# cluster in the cases where it is expected for member addresses
# to be mostly static, and combined with 'auto.allow.hosts' makes
# startup easier when learning failover manager.
stable.nodes.file=false

The db.reuse.connection.count property allows the administrator to specify the number of times Failover Manager reuses the same database connection to check the database health. The default value is 0, indicating that Failover Manager creates a fresh connection each time. This property isn't required on a dedicated witness node.

# This property controls how many times a database connection is
# reused before creating a new one. If set to zero, a new
# connection will be created every time an agent pings its local
# database.
db.reuse.connection.count=0

The auto.failover property enables automatic failover. By default, auto.failover is set to true.

# Whether or not failover will happen automatically when the primary
# fails. Set to false if you want to receive the failover notifications
# but not have EFM actually perform the failover steps.
# The value of this property must be the same across all agents.
auto.failover=true

Use the auto.reconfigure property to instruct Failover Manager to enable or disable automatic reconfiguration of remaining standby servers after the primary standby is promoted to primary. Set the property to true (the default) to enable automatic reconfiguration or false to disable automatic reconfiguration. This property isn't required on a dedicated witness node. If you're using EDB Postgres Advanced Server or PostgreSQL version 11, the recovery.conf file is backed up during the reconfiguration process.

# After a standby is promoted, Failover Manager will attempt to
# update the remaining standbys to use the new primary. For database
# versions before 12, Failover Manager will back up recovery.conf.
# Then it will change the host parameter of the primary_conninfo entry
# in recovery.conf or postgresql.auto.conf, and restart the database.
# The restart command is contained in either the efm_db_functions or
# efm_root_functions file; default when not running db as an os
# service is: "pg_ctl restart -m fast -w -t <timeout> -D <directory>"
# where the timeout is the local.timeout property value and the
# directory is specified by db.data.dir. To turn off
# automatic reconfiguration, set this property to false.
auto.reconfigure=true
Note

primary_conninfo is a space-delimited list of keyword=value pairs.

Use the promotable property to indicate not to promote a node. The promotable property is ignored when a primary agent starts. This simplifies switching back to the original primary after a switchover or failover. To override the setting, use the efm set-priority command at runtime. For more information about the efm set-priority command, see Using the efm utility.

# A standby with this set to false will not be added to the
# failover priority list, and so will not be available for
# promotion. The property will be used whenever an agent starts
# as a standby or resumes as a standby after being idle. After
# startup/resume, the node can still be added or removed from the
# priority list with the 'efm set-priority' command. This
# property is required for all non-witness nodes.
promotable=true

If the same amount of data was written to more than one standby node and a failover occurs, the use.replay.tiebreaker value determines how Failover Manager selects a replacement primary. Set the use.replay.tiebreaker property to true to instruct Failover Manager to failover to the node that will come out of recovery faster, as determined by the log sequence number. To ignore the log sequence number and promote a node based on user preference, set use.replay.tiebreaker to false.

# Use replay LSN value for tiebreaker when choosing a standby to
# promote before using failover priority. Set this property to true to
# consider replay location as more important than failover priority
# (as seen in cluster-status command) when choosing the "most ahead"
# standby to promote.
use.replay.tiebreaker=true

Use the standby.restart.delay property to specify the time in seconds for the standby to wait before it gets reconfigured (stoppstarts) to follow the new primary after a promotion.

# Time in seconds for this standby to delay restarting to follow the
# primary after a promotion. This can be used to have standbys restart
# at different times to increase availability. Caution should be used
# when using this feature, as a delayed standby will not be following
# the new primary and care must be taken that the new primary retains
# enough WAL for the standby to follow it.
# Please see the user's guide for more information.
standby.restart.delay=0

You can use the application.name property to provide the name of an application to copy to the primary_conninfo parameter before restarting an old primary node as a standby.

# During a switchover, recovery settings are copied from a standby
# to the original primary. If the application.name property is set,
# Failover Manager will replace the application_name portion of the
# primary_conninfo entry with this property value before starting
# the original primary database as a standby. If this property is
# not set, Failover Manager will remove the parameter value
# from primary_conninfo.
application.name=
Note

Set the application.name property on the primary and any promotable standby. In the event of a failover/switchover, the primary node can potentially become a standby node again.

Use the restore.command property to instruct Failover Manager to update the restore_command value when a new primary is promoted. %h represents the address of the new primary. Failover Manager replaces %h with the address of the new primary. %f and %p are placeholders used by the server. If the property is left blank, Failover Manager doesn't update the restore_command values on the standbys after a promotion.

See the PostgreSQL documentation for more information about using a restore_command.

# If the restore_command on a standby restores directly from the
# primary node, use this property to have Failover Manager change
# the command when a new primary is promoted.
#
# Use the %h placeholder to represent the address of the new primary.
# During promotion it will be replaced with the address of the new
# primary.
#
# If not specified, failover manager will not change the
# restore_command value, if any, on standby nodes.
#
# Example:
# restore.command=scp <db service owner>@%h:/var/lib/edb/as12/data/archive/%f %p
restore.command=

The database parameter synchronous_standby_names on the primary node specifies the names and count of the synchronous standby servers that confirm receipt of data to ensure that the primary nodes can accept write transactions. When the reconfigure.num.sync property is set to true, Failover Manager reduces the number of synchronous standby servers and reloads the configuration of the primary node to reflect the current value.

# Reduce num_sync when the number of synchronous standbys drops below
# the value required by the primary database. If set to true, Failover
# Manager will reduce the number of standbys needed in the primary's
# synchronous_standby_names property and reload the primary
# configuration. Failover Manager will not reduce the number below 1,
# taking the primary out of synchronous replication, unless the
# reconfigure.sync.primary property is also set to true.
# To raise num_sync, see the reconfigure.num.sync.max property below.
reconfigure.num.sync=false
Note

If you're using the reconfigure.num.sync property, make sure that the wal_sender_timeout value in the primary database is set to at least 10 seconds less than the efm.node.timeout value.

Use the reconfigure.num.sync.max property to specify the maximum number to which num-sync can be raised when a standby is added to the cluster.

# If reconfigure.num.sync is set to true and this property is set,
# Failover Manager will check if num_sync can be raised when a standby
# is added to the cluster.
# Failover Manager will not raise the value above the maximum set here.
# If the primary database has been taken out of synchronous mode
# completely (see the reconfigure.sync.primary property), then Failover
# Manager will not reconfigure the primary database if standbys are
# added to the cluster.
reconfigure.num.sync.max=

Set the reconfigure.sync.primary property to true to take the primary database out of synchronous replication mode if the number of standby nodes drops below the level required. Set reconfigure.sync.primary to false to send a notification if the standby count drops without interrupting synchronous replication.

# Take the primary database out of synchronous replication mode when
# needed. If set to true, Failover Manager will clear the
# synchronous_standby_names configuration parameter on the primary
# if the number of synchronous standbys drops below the required
# level for the primary to accept writes.
# If set to false, Failover Manager will detect the situation but
# will only send a notification if the standby count drops below the
# required level.
#
# CAUTION: TAKING THE PRIMARY DATABASE OUT OF SYNCHRONOUS MODE MEANS
# THERE MAY ONLY BE ONE COPY OF DATA. DO NOT MAKE THIS CHANGE UNLESS
# YOU ARE SURE THIS IS OK.
reconfigure.sync.primary=false
Note

If you're using the reconfigure.sync.primary property, ensure that the wal_sender_timeout value in the primary database is set to at least 10 seconds less than the efm.node.timeout value.

Use the minimum.standbys property to specify the minimum number of standby nodes to retain on a cluster. If the standby count drops to the specified minimum, a replica node isn't promoted if a failure of the primary node occurs.

# Instead of setting specific standbys as being unavailable for
# promotion, this property can be used to set a minimum number
# of standbys that will not be promoted. Set to one, for
# example, promotion will not happen if it will drop the number
# of standbys below this value. This property must be the same on
# each node.
minimum.standbys=0

Use the priority.standbys property to specify the priority of standbys after this node is promoted.

# Space-separated list of standby addresses that are high priority for
# promotion when this node is the primary. If set, when this node is
# promoted, addresses in this list will be added to the front of the
# standby priority list. If this list contains addresses that are not
# standbys at the time of promotion, they will not be added.
priority.standbys=

Use the recovery.check.period property to specify the number of seconds for Failover Manager to wait before it checks to see if a database is out of recovery.

# Time in seconds between checks to see if a promoting database
# is out of recovery.
recovery.check.period=1

Use the restart.connection.timeout property to specify the number of seconds for Failover Manager to attempt to connect to a newly reconfigured primary or standby node while the database on that node prepares to accept connections.

# Time in seconds to keep trying to connect to a database after a
# start or restart command returns successfully but the database
# is not ready to accept connections yet (a rare occurance). This
# applies to standby databases that are restarted when being
# reconfigured for a new primary, and to primary databases that
# are stopped astarts as standbys during a switchover.
# This retry mechanism is unrelated to the auto.resume.period
# parameter.
restart.connection.timeout=60

Use the auto.resume.period property to specify the number of seconds for an agent to attempt to resume monitoring that database. This property applies after a monitored database fails and an agent has assumed an idle state or when starting in IDLE mode.

# Period in seconds for IDLE agents to try to resume monitoring
# after a database failure or when starting in IDLE mode. Set to
# 0 for agents to not try to resume (in which case the
# 'efm resume <cluster>' command is used after bringing a
# database back up).
auto.resume.period=0

Failover Manager provides support for clusters that use a virtual IP. If your cluster uses a virtual IP, provide the host name or IP address in the virtual.ip property. Specify the corresponding prefix in the virtual.ip.prefix property. Leave virtual.ip to disable virtual IP support.

Use the virtual.ip.interface property to provide the network interface used by the VIP.

The specified virtual IP address is assigned only to the primary node of the cluster. If you specify virtual.ip.single=true, the same VIP address is used on the new primary if a failover occurs. Specify a value of false to provide a unique IP address for each node of the cluster.

For information about using a virtual IP address, see Using Failover Manager with virtual IP addresses.

# These properties specify the IP and prefix length that will be
# remapped during failover. If you do not use a VIP as part of
# your failover solution, leave the virtual.ip property blank to
# disable Failover Manager support for VIP processing (assigning,
# releasing, testing reachability, etc).
#
# If you specify a VIP, the interface and prefix are required.
#
# If you specify a host name, it will be resolved to an IP address
# when acquiring or releasing the VIP. If the host name resolves
# to more than one IP address, there is no way to predict which
# address Failover Manager will use.
#
# By default, the virtual.ip and virtual.ip.prefix values must be
# the same across all agents. If you set virtual.ip.single to
# false, you can specify unique values for virtual.ip and
# virtual.ip.prefix on each node.
#
# If you are using an IPv4 address, the virtual.ip.interface value
# should not contain a secondary virtual ip id (do not include
# ":1", etc).
virtual.ip=
virtual.ip.interface=
virtual.ip.prefix=
virtual.ip.single=true
Note

If a primary agent starts and the node doesn't currently have the VIP, the Failover Manager agent acquires it. Stopping a primary agent doesn't drop the VIP from the node.

Set the check.vip.before.promotion property to false to prevent Failover Manager from checking to see if a VIP is in use before assigning it to a new primary in case of a failure. This might result in multiple nodes broadcasting on the same VIP address. Unless the primary node is isolated or can be shut down via another process, set this property to true.

# Whether to check if the VIP (when used) is still in use before
# promoting after a primary failure. Turning this off may allow
# the new primary to have the VIP even though another node is also
# broadcasting it. This should only be used in environments where
# it is known that the failed primary node will be isolated or
# shut down through other means.
check.vip.before.promotion=true

Use the pgpool.enable property to specify if you want to enable the Failover Manager and Pgpool integration for high availability. If you want to enable Pgpool integration in a non-sudo mode (running as the DB owner), the PCPPASS file must be owned by the DB owner operating system user and you must set the file permissions to 600.

# A boolean property to enable Failover Manager managed Pgpool HA.
# If enabled, Failover Manager would natively update the joining
# and leaving status of database nodes to active pgpool instance.
# Failover manager expects properly configured and running pgpool
# instances on required nodes. It does not manage setup and
# configuration of pgpool on any node.
#
# By default the property is disabled.
pgpool.enable=false

Use the following parameters to specify the values to use for Pgpool integration.

# Configurations required for pgpool integration.
# 'pcp.user' - User that would be invoking PCP commands
# 'pcp.host' - Virtual IP that would be used by pgpool. Same as
# pgpool parameter 'delegate_IP'
# 'pcp.port' - The port on which pgpool listens for pcp commands.
# 'pcp.pass.file' - Absolute path of PCPPASSFILE.
# 'pgpool.bin' - Absolute path of pgpool bin directory

# These properties are required if 'pgpool.enable' is set to true.
pcp.user=
pcp.host=
pcp.port=
pcp.pass.file=
pgpool.bin=

Use the following properties to provide paths to scripts that reconfigure your load balancer in case of a switchover or primary failure scenario. The scripts are also invoked when a standby failure occurs. If you're using these properties, provide them on every node of the cluster (primary, standby, and witness) to ensure that if a database node fails, another node will call the detach script with the failed node's address.

You don't need to set the following properties if you are using Pgpool as the load balancer solution and you have set the Pgpool integration properties.

Provide a script named after the script.load.balancer.attach property to identify a script to invoke when you want to attach a node to the load balancer. Use the script.load.balancer.detach property to specify the name of a script to invoke when you want to detach node from the load balancer. Include the %h placeholder to represent the IP address of the node that's being attached or removed from the cluster. Include the %t placeholder to instruct Failover Manager to include a p (for a primary node) or an s (for a standby node) in the string.

# Absolute path to load balancer scripts
# The attach script is called when a node should be attached to
# the load balancer, for example after a promotion. The detach
# script is called when a node should be removed, for example
# when a database has failed or is about to be stopped. Use %h to
# represent the IP/hostname of the node that is being
# attached/detached. Use %t to represent the type of node being
# attached or detached: the letter m will be passed in for primary nodes
#and the letter s for standby nodes.
#
# Example:
# script.load.balancer.attach=/somepath/attachscript %h %t
script.load.balancer.attach=
script.load.balancer.detach=

Use the detach.on.agent.failure property to indicate that you don't want to detach a node from the load balancer in a scenario where the primary agent fails but the database is still reachable. The default value is true.

# If set to true, Failover Manager will detach the node from load
# balancer if the primary agent fails but the database is still
# reachable. In most scenarios this is NOT the desired situation. In
# scenarios where the detach script should run with a failed primary
# agent, even when the primary database is still healthy this parameter
# should be set to true. If no value specified it defaults to true (for
# backwards compatibility).
# This is not applicable for standbys.
detach.on.agent.failure=

The script.fence property specifies the path to an optional user-supplied script to invoke during the promotion of a standby node to primary node.

# absolute path to fencing script run during promotion
#
# This is an optional user-supplied script that will be run
# during failover on the standby database node. If left blank,
# no action will be taken. If specified, EFM will execute this
# script before promoting the standby.
#
# Parameters can be passed into this script for the failed primary
# and new primary node addresses. Use %p for new primary and %f
# for failed primary. On a node that has just been promoted, %p
# should be the same as the node's efm binding address.
#
# Example:
# script.fence=/somepath/myscript %p %f
#
# NOTE: FAILOVER WILL NOT OCCUR IF THIS SCRIPT RETURNS A NON-ZERO EXIT
# CODE.
script.fence=

Use the script.post.promotion property to specify the path to an optional user-supplied script to invoke after a standby node is promoted to primary.

# Absolute path to fencing script run after promotion
#
# This is an optional user-supplied script that will be run after
# failover on the standby node after it has been promoted and
# is no longer in recovery. The exit code from this script has
# no effect on failover manager, but will be included in a
# notification sent after the script executes.
#
# Parameters can be passed into this script for the failed primary
# and new primary node addresses. Use %p for new primary and %f
# for failed primary. On a node that has just been promoted, %p
# should be the same as the node's efm binding address.
#
# Example:
# script.post.promotion=/somepath/myscript %f %p
script.post.promotion=

Use the script.resumed property to specify an optional path to a user-supplied script to invoke when an agent resumes monitoring a database.

# Absolute path to resume script
#
# This script is run before an IDLE agent resumes
# monitoring its local database.
script.resumed=

Use the script.db.failure property to specify the complete path to an optional user-supplied script that Failover Manager invokes if an agent detects that the database that it monitors has failed.

# Absolute path to script run after database failure
# This is an optional user-supplied script that will be run after
# an agent detects that its local database has failed.
script.db.failure=

Use the script.primary.isolated property to specify the complete path to an optional user-supplied script that Failover Manager invokes if the agent monitoring the primary database detects that the primary is isolated from the majority of the Failover Manager cluster. This script is called immediately after the VIP is released (if a VIP is in use).

# Absolute path to script run on isolated primary
# This is an optional user-supplied script that will be run after
# a primary agent detects that it has been isolated from the
# majority of the efm cluster.
script.primary.isolated=

Use the script.remote.pre.promotion property to specify the path and name of a script to invoke on any agent nodes not involved in the promotion when a node is about to promote its database to primary.

Include the %p placeholder to identify the address of the new primary node.

# Absolute path to script invoked on non-promoting agent nodes
# before a promotion.
#
# This optional user-supplied script will be invoked on other
# agents when a node is about to promote its database. The exit
# code from this script has no effect on Failover Manager, but
# will be included in a notification sent after the script
# executes.
#
# Pass a parameter (%p) with the script to identify the new
# primary node address.
#
# Example:
# script.remote.pre.promotion=/path_name/script_name %p
script.remote.pre.promotion=

Use the script.remote.post.promotion property to specify the path and name of a script to invoke on any nonprimary nodes after a promotion occurs.

Include the %p placeholder to identify the address of the new primary node.

# Absolute path to script invoked on non-primary agent nodes
# after a promotion.
#
# This optional user-supplied script will be invoked on nodes
# (except the new primary) after a promotion occurs. The exit code
# from this script has no effect on Failover Manager, but will be
# included in a notification sent after the script executes.
#
# Pass a parameter (%p) with the script to identify the new
# primary node address.
#
# Example:
# script.remote.post.promotion=/path_name/script_name %p
script.remote.post.promotion=

Use the script.custom.monitor property to provide the name and location of an optional script to invoke on regular intervals, specified in seconds by the custom.monitor.interval property.

Use custom.monitor.timeout to specify the maximum time for the script to run. If script execution doesn't finish in the time specified, Failover Manager sends a notification.

Set custom.monitor.safe.mode to true to instruct Failover Manager to report nonzero exit codes from the script but not promote a standby as a result of an exit code.

# Absolute path to a custom monitoring script.
#
# Use script.custom.monitor to specify the location and name of
# an optional user-supplied script that will be invoked
# periodically to perform custom monitoring tasks. A non-zero
# exit value means that a check has failed; this will be treated
# as a database failure. On a primary node, script failure will
# cause a promotion. On a standby node script failure will
# generate a notification and the agent will become IDLE.
#
# The custom.monitor.\* properties are required if a custom
# monitoring script is specified:
#
# custom.monitor.interval is the time in seconds between executions
# of the script.
#
# custom.monitor.timeout is a timeout value in seconds for how
# long the script will be allowed to run. If script execution
# exceeds the specified time, the task will be stopped and a
# notification sent. Subsequent runs will continue.
#
# If custom.monitor.safe.mode is set to true, non-zero exit codes
# from the script will be reported but will not cause a promotion
# or be treated as a database failure. This allows testing of the
# script without affecting EFM.
#
script.custom.monitor=
custom.monitor.interval=
custom.monitor.timeout=
custom.monitor.safe.mode=

Use the sudo.command property to specify a command for Failover Manager to invoke when performing tasks that require extended permissions. Use this option to include command options that might be specific to your system authentication.

Use the sudo.user.command property to specify a command for Failover Manager to invoke when executing commands performed by the database owner.

# Command to use in place of 'sudo' if desired when efm runs
# the efm_db_functions or efm_root_functions, or efm_address
# scripts.
# Sudo is used in the following ways by efm:
#
# sudo /usr/edb/efm-<version>/bin/efm_address <arguments>
# sudo /usr/edb/efm-<version>/bin/efm_root_functions <arguments>
# sudo -u <db service owner> /usr/edb/efm-<version>/bin/efm_db_functions <arguments>
#
# 'sudo' in the first two examples will be replaced by the value
# of the sudo.command property. 'sudo -u <db service owner>' will
# be replaced by the value of the sudo.user.command property.
# The '%u' field will be replaced with the db owner.
sudo.command=sudo
sudo.user.command=sudo -u %u

Use the lock.dir property to specify an alternative location for the Failover Manager lock file. The file prevents Failover Manager from starting multiple, potentially orphaned, agents for a single cluster on the node.

# Specify the directory of lock file on the node. Failover
# Manager creates a file named <cluster>.lock at this location to
# avoid starting multiple agents for same cluster. If the path
# does not exist, Failover Manager will attempt to create it. If
# not specified defaults to '/var/lock/efm-<version>'
lock.dir=

Use the log.dir property to specify the location to write agent log files. Failover Manager attempts to create the directory if the directory doesn't exist.

# Specify the directory of agent logs on the node. If the path
# does not exist, Failover Manager will attempt to create it. If
# not specified defaults to '/var/log/efm-<version>'. (To store
# Failover Manager startup logs in a custom location, modify the
# path in the service script to point to an existing, writable
# directory.)
# If using a custom log directory, you must configure
# logrotate separately. Use 'man logrotate' for more information.
log.dir=

After enabling the UDP or TCP protocol on a Failover Manager host, you can enable logging to syslog. Use the syslog.protocol parameter to specify the protocol type (UDP or TCP) and the syslog.port parameter to specify the listener port of the syslog host. You can use the syslog.facility value as an identifier for the process that created the entry. Use a value between LOCAL0 and LOCAL7.

# Syslog information. The syslog service must be listening on
# the port for the given protocol, which can be UDP or TCP.
# The facilities supported are LOCAL0 through LOCAL7.
syslog.host=localhost
syslog.port=514
syslog.protocol=UDP
syslog.facility=LOCAL1

Use the file.log.enabled and syslog.enabled properties to specify the type of logging that you want to implement. Set file.log.enabled to true to enable logging to a file. Enable the UDP protocol or TCP protocol and set syslog.enabled to true to enable logging to syslog. You can enable logging to both a file and syslog.

# Which logging is enabled.
file.log.enabled=true
syslog.enabled=false

For more information about configuring syslog logging, see Enabling syslog log file entries.

Use the jgroups.loglevel and efm.loglevel parameters to specify the level of detail logged by Failover Manager. The default value is INFO. For more information about logging, see Controlling logging.

# Logging levels for JGroups and EFM.
# Valid values are: TRACE, DEBUG, INFO, WARN, ERROR
# Default value: INFO
# It is not necessary to increase these values unless debugging a
# specific issue. If nodes are not discovering each other at
# startup, increasing the jgroups level to DEBUG will show
# information about the TCP connection attempts that may help
# diagnose the connection failures.
# TRACE level logging should be used for diagnosing problems only.
# It is not supported for production use.
jgroups.loglevel=INFO
efm.loglevel=INFO

Use the jvm.options property to pass JVM-related configuration information. The default setting specifies the amount of memory that the Failover Manager agent can use.

# Extra information that will be passed to the JVM when starting
# the agent.
jvm.options=-Xmx128m

encrypting_database_password