Configuring Pgpool-II v1.0

Edit this page
Note

The configuration options for Pgpool-II are extensive; consider the options listed below as a starting point only. For more information about configuring and using Pgpool-II, please consult the project website.

Commonly Used Parameters

The following table lists pgpool.conf parameters to use when implementing connection pooling:

Parameter NameDescription
listen_addressesHostname or IP address used by Pgpool-II to listen for connections. The default is localhost. Change to '*' for all addresses.
portPort for Pgpool-II connections. The default is 9999.
pcp_portPort for PCP connections. The default is 9898.
backend_hostname0Hostname or IP address for backend 0. You can specify '' if the backend and Pgpool-II are running on the same host.
backend_port0Port number for backend 0.
backend_weight0Weight for backend 0 (only in load balancing mode). Specify 1 for each backend if you want to balance the load equally or decimal values (.9, .1, etc.) to weigh the load towards specific backends.
backend_data_directory0Data directory for backend 0.
enable_pool_hbaSet to on to use pool_hba.conf for client authentication.
num_init_childrenNumber of pools. Default is 32.
max_poolNumber of connections per pool. Default is 4.
connection_cacheSet to on to enable connection pooling.
pool_conn_dbnameDatabase name to which Pgpool-II will connect. By default, Pgpool-II will connect with Postgres. Please note that the pool_conn_dbname parameter is now deprecated.

The following table lists pgpool.conf parameters to use when implementing replication and load balancing:

Parameter NameDescription
Allow_sql_commentsIf on, ignore SQL comments; modifications to this parameter require a reload of the pgpool.conf file.
load_balance_modeSet to on to activate load balancing mode. If load_balance_mode is on and replicate_select is off, SELECT statements are sent to one backend. The parameter backend_weight<N>.z determines the proportion of SELECT statements each backend receives.
ignore_leading_white_spaceIgnore leading white spaces of each query. Certain APIs such as DBI/DBD::Pg for Perl add white space that the user cannot control. Default is on.

Configuring Connection Pooling

Pgpool-II provides a set of child processes that maintain cached connections to one or more database servers. When a client connects, Pgpool-II attempts to reuse a connection from its pool, thus avoiding the overhead of opening and closing client connections.

You can reuse a connection in the pool only if the target database and the connection user match a prior connection, which is currently in the pool. The pgpool.conf file specifies the connection pooling configuration options (such as the number of child processes and the maximum number of cached connections per child).

To configure connection pooling with one database server:

  1. Configure the pg_hba.conf file on the Pgpool-II host to permit connections between the clients and the server.

  2. Copy the pgpool.conf.sample file to pgpool.conf, modify the file, set the connection_cache parameter to on, and specify connection properties for your database server.

    The following example shows how to connect with the EDB Postgres Advanced Server:

    connection_cache = on

    backend_hostname0 = 'localhost'

    backend_port0 = 5444

    backend_weight0 = 1

    backend_data_directory0 = '/var/lib/edb/as13/data'

    The following example shows how to connect with the PostgreSQL Server:

    connection_cache = on

    backend_hostname0 = 'localhost'

    backend_port0 = 5432

    backend_weight0 = 1

    backend_data_directory0 = '/var/lib/pgsql/13/data'

    Note

    In the pgpool.conf file, connection parameters have an appended digit that specifies a cluster node identifier. Database node 0 specifies values for the primary node.

  3. Optionally, configure Pgpool-II client authentication.

  4. Optionally, configure the PCP administrative interface.

  5. Start Pgpool-II and begin using your application using the following command:

    systemctl start edb-pgpool-<x.y>.service

    where <x.y> is the Pgpool release version.

  6. Execute the following platform-specific command to connect to Pgpool42:

    On EDB Postgres Advanced Server for CentOS 7:

    ./psql -d edb -p 9999 -U enterprisedb -h /tmp

    On EDB Postgres Advanced Server for Debian:

    ./psql -d edb -p 9999 -U enterprisedb

    On PostgreSQL Server for CentOS 7:

    ./psql -d postgres -p 9999 -U postgres -h /tmp

    On PostgreSQL Server for Debian:

    ./psql -d postgres -p 9999 -U postgres

Configuring Load Balancing

EDB supports replication scenarios that use Pgpool-II load balancing with PostgreSQL streaming replication or Slony replication. The supported replication methods ensure that database updates made by client applications apply to multiple backend servers. For detailed information about the benefits of each replication method and detailed configuration instructions, please review project documentation for each utility.

When load balancing is enabled, Pgpool-II distributes some types of SELECT statements to backend servers, allowing multiple database servers and hosts to share the processing load of SELECT statements issued by client applications.

When configuring Pgpool-II load balancing, the initial database environments in all backend servers must be identical:

  • Tables must have the same name, definition, and row content.
  • Schemas must exist in each backend application database.
  • Configure roles and privileges on each backend server to ensure the result set of SQL statements are identical on all servers.

If you use password authentication, assign the same password to an associated user name on each database server. Use the same user name/password pair to connect Pgpool-II to each backend connection.

Within a replication scenario, each backend is uniquely identified by the hostname (or IP address) and the port number on which the database server instance is listening for connections. You must ensure that the pool_hba.conf and pg_hba.conf files allow a connection between that server and the host on which Pgpool-II will be running.

The following example demonstrates how to implement Pgpool-II load balancing with two servers (the primary and replica nodes) in a Streaming Replication scenario. Configuring Pgpool-II load balancing for a Slony replication scenario is similar; please see the Slony documentation for information about configuring Slony replication.

Configuring the Primary Node of the Replication Scenario

Open an SSH session with the primary node of the replication scenario, and modify the pg_hba.conf file (located in the /var/lib/edb/as13/data directory), adding connection information for the replication user (in the example that follows, edbrepuser resides on a standby node with an IP address of 107.178.217.178):

host replication edbrepuser 107.178.217.178/32 md5

The connection information should specify the address of the replication scenario's standby node and your preferred authentication method.

Modify the postgresql.conf file (located in /var/lib/edb/as13/data), adding the following replication parameter and values to the end of the file:

wal_level = replica max_wal_senders = 10 checkpoint_segments = 8 wal_keep_segments = 0

Save the configuration file, and restart the server:

systemctl restart edb-as-13

Note

The above command is applicable for RHEL/CentOS 7 and 8 platforms.

Use the sudo su - command to assume the identity of the enterprisedb database superuser:

sudo su - enterprisedb

Then, start a psql session, connecting to the edb database:

psql -d edb

At the psql command line, create a user with the replication attribute:

CREATE ROLE edbrepuser WITH REPLICATION LOGIN PASSWORD 'password';

Configuring the Standby Node of the Replication Scenario

Open an SSH session with the standby server, and assume the identity of the database superuser (enterprisedb):

sudo su - enterprisedb

With your choice of editor, create a .pgpass file in the home directory of the enterprisedb user. The .pgpass file holds the password of the replication user in plain-text form; if you are using a .pgpass file, you should ensure that only trusted users have access to the .pgpass file:

Add an entry that specifies connection information for the replication user:

*:5444:*:edbrepuser:password

The server will enforce restrictive permissions on the .pgpass file; use the following command to set the file permissions:

chmod 600 .pgpass

Relinquish the identity of the database superuser:

exit

Then, assume superuser privileges:

sudo su -

Use your platform-specific command to stop the database server before replacing the data directory on the standby node with the data directory of the primary node.

Then, delete the data directory on the standby node:

rm -rf /var/lib/edb/as13/data

After deleting the existing data directory, use the pg_basebackup utility to copy the data directory of the primary node to the standby:

pg_basebackup --format=p --label=standby --host=146.148.46.44 --username=edbrepuser --password --wal-method=stream -R

The call to pg_basebackup specifies the IP address of the primary node and the name of the replication user created on the primary node.

Including the -R option creates the standby.signal file and appends connection settings to postgresql.auto.conf in the output directory (or into the base archive file when using tar format) to ease setting up a standby server.

For more information about the options available with the pg_basebackup utility, see the PostgreSQL core documentation at:

https://www.postgresql.org/docs/current/static/app-pgbasebackup.html

When prompted by pg_basebackup, provide the password associated with the replication user.

After copying the data directory, change ownership of the directory to the database superuser (enterprisedb):

chown -R enterprisedb /var/lib/edb/as13/data

Modify the postgresql.conf file (located in /var/lib/edb/as13/data), specifying the following values at the end of the file:

wal_level = replica hot_standby = on

The data file has been copied from the primary node, and contains the replication parameters specified previously.

Then, restart the server. At this point, the primary node will be replicating data to the standby node.

Configuring Pgpool-II Load Balancing

Use your choice of editor to modify the pgpool.conf file. Within the pgpool.conf file, modify the parameter settings to specify that load balancing is enabled:

load_balance_mode = on

Then, specify the connections settings for the primary database node in the parameter set that ends with a 0. For example:

backend_hostname0 = '146.148.46.44' backend_port0 = 5444 backend_weight0 = 1 backend_data_directory0 = '/var/lib/edb/as13/data'

Then, specify the connections settings for each node to which queries will be distributed. Increment the number that follows the parameter name for each node, and provide connection details:

backend_hostname1 = '107.178.217.178' backend_port1 = 5444 backend_weight1 = 1 backend_data_directory1 = '/var/lib/edb/as13/data'

Use the backend_weight parameter to specify how queries will be distributed amongst the nodes. Specify a value of 1 to indicate that you wish (qualified) queries to be equally distributed across the nodes of the replication scenario.

Restart Pgpool-II

systemctl restart edb-pgpool-<x.y>.service

where <x.y> is the Pgpool release version.

Configuring Client Authentication

When Pgpool-II is enabled, client applications connect to Pgpool-II, which acts as a middleman for a Postgres server. A connecting client application is first authenticated with the Pgpool-II server and then authenticated with the Postgres server.

Parameter settings in the pool_hba.conf configuration file determine the Pgpool-II authentication properties. The pool_hba.conf file is similar in format and function to the Postgres pg_hba.conf configuration file. Please refer to the Pgpool-II documentation for detailed information about pool_hba.conf entries.

To enable Pgpool-II authentication:

  1. Copy the pool_hba.conf.sample file to pool_hba.conf.
  2. Modify the pool_hba.conf file, specifying authentication information for servers or users you want to connect. Entries must follow the same format used in the pg_hba.conf file.
  3. Modify the pgpool.conf file, setting the enable_pool_hba parameter to on.
  4. Restart Pgpool-II to reload the Pgpool-II configuration files.
Note

When authenticating with the database server, use the user names and passwords specified in the pool_hba.conf file; you must also specify those user names and passwords in the database server's pg_hba.conf file.

Configuring PCP

PCP is an administrative interface for Pgpool-II that allows you to retrieve information about database nodes, Pgpool-II child processes, and other information. You should issue PCP commands from the Linux command line.

pcp.conf is the password configuration file for the PCP client. Before using PCP commands, you must modify the pcp.conf file, providing the user names and passwords you provide when invoking a PCP command. The user names in the pcp.conf file are entirely independent of the database server user names and passwords.

Use the following steps to enable PCP:

  1. Copy the pcp.conf.sample file to pcp.conf.

  2. Add an entry to the pcp.conf file in the following form:

    username:md5_password

    where:

    username is a PCP user name.

    md5_password is the PCP password in md5 format

    You can use the pg_md5 program to generate the encrypted password from the clear-text form as shown below:

    $ pg_md5 mypassword

    34819d7beeabb9260a5c854bc85b3e44

    For example, the entry in the pcp.conf file for a PCP user named pcpuser with the password of mypassword is:

    # USERID:MD5PASSWD pcpuser:34819d7beeabb9260a5c854bc85b3e44

  3. Restart the Pgpool service.

  4. When issuing a PCP command, specify the PCP user name and the unencrypted form of the password:

$ pcp_node_info 5 localhost 9898 pcpuser mypassword 0
localhost 5444 1 1.000000

After configuring PCP, you can use the following PCP commands to control Pgpool-II and retrieve information.

PCP CommandDescription
pcp_common_optionsCommon options used in PCP commands
pcp_node_countDisplays the total number of database nodes
pcp_node_infoDisplays the information on the given node ID
pcp_health_check_statsDisplays health check statistics data on given node ID
pcp_watchdog_infoDisplays the watchdog status of the Pgpool-II
pcp_proc_countDisplays the list of Pgpool-II children process IDs
pcp_proc_infoDisplays the information on the given Pgpool-II child process ID
pcp_pool_statusDisplays the parameter values as defined in pgpool.conf
pcp_detach_nodeDetaches the given node from Pgpool-II. Existing connections to Pgpool-II are forced to be disconnected.
pcp_attach_nodeAttaches the given node to Pgpool-II.
pcp_promote_nodePromotes the given node as new main to Pgpool-II
pcp_stop_pgpoolTerminates the Pgpool-II process
pcp_reload_configReload pgpool-II config file
pcp_recovery_nodeAttaches the given backend node with recovery
Note

pcp_health_check_stats and pcp_reload_config commands are available from Pgpool version 4.2 onwards.

To view more information about PCP command options, visit the Pgpool project site.

Configuring Number of Connections and Pooling

Pgpool has some configuration to tune the pooling and connection processing. Depending on this configuration, the Postgres configuration for max_connections must also be set to ensure all connections can be accepted as required. Furthermore, note that the Cloud Architecture works with active/active instances, which needs to spread num_init_children over all Pgpool instances (divide the normally used value by the number of active instances). The below text describes the effect of changing the configuration, and advises values for both the on-premise and the cloud architecture.

max_pool: Generally, advised to set max_pool to 1. Alternatively, for applications with many reconnects, max_pool can be set to the number of distinct combinations of users, databases, and connection options for the application connections. All but one connection in the pool would be stale connections, which consume a connection slot from Postgres without adding to the performance. It is, therefore, advised not to configure max_pool beyond four to preserve a healthy ratio between active and stale connections. As an example, for an application that constantly reconnects and uses two distinct users, both connecting to their own database, set it to 2. If both users would be able to connect to both databases, set it to 4. Note that increasing max_pool requires to tune down num_init_children in Pgpool, or tune up max_connections in Postgres.

num_init_children: It is advised to set num_init_children to the number of connections that could be running active in parallel, but the value should be divided by the number of active Pgpool-II instances (one with the on-premise architecture and all instances for the cloud architecture). As an example: In an architecture with 3 Pgpool instances, to allow the application to have 100 active connections in parallel, set num_init_children to 100 for the on-premise architecture, and set num_init_children to 33 for the cloud architecture. Note that increasing num_init_children generally requires to tune up max_connections in Postgres.

listen_backlog_multiplier: Can be set to multiply the number of open connections (as perceived by the application) with the number of active connections (num_init_children). As an example, when the application might open 500 connections, of which 100 should be active in parallel, with the on-premise architecture, num_init_children should be set to 100, and listen_backlog_multiplier should be set to 4. This setup can process 100 connections active in parallel, and another 400 (listen_backlog_multiplier x num_init_children) connections will be queued before connections will be blocked. The application would perceive a total of 500 open connections, and Postgres would process the load of 100 connections maximum at all times. Note that increasing listen_backlog_multiplier only causes the application to perceive more connections but will not increase the number of parallel active connections (which is determined by num_init_children).

max_connections: It is advised to set max_connections in Postgres higher than [number of active pgpool instances] x [max_pool] x [num_init_children] + [superuser_reserved_connections] (Postgres). As an example: in the on-premise setup with 3 instances active/passive, max_pool set to 2, num_init_children set to 100, and superuser_reserved_connections (Postgres) set to 5, Postgres max_connections should be set equal to or higher than [1 x 2 x 100+5], which is 205 connections, or higher. A similar setup in the cloud setup would run with 3 active instances, max_pool set to 2, num_init_children set to 33, and superuser_reserved_connections (Postgres) set to 5, in which case Postgres max_connections should be set equal or higher than [3x 2 x 33+5] which is 203 or higher. Note that configuring below the advised setting can cause issues opening new connections, and in combination with max_pool, can cause unexpected behavior (low or no active connections but still connection issues due to stale pooled connections using connection slots from Postgres. For more information on the relation between num_init_children, max_pool and max_connections, see this background information.

Pgpool-II Host Setup

After modifying the parameter settings that implement Pgpool-II functionality for your installation, you must start the Pgpool-II service.

When Pgpool-II starts, it records its process ID in a pgpool.conf file whose name is determined by the pid_file_name configuration parameter. The initial value of the pid_file_name parameter in the sample file is:

pid_file_name = /var/run/edb/pgpool<x.y>/edb-pgpool-<x.y>.pid

Where <x.y> is the Pgpool release version.

Note

The operating system may remove the contents of the /var/run directory (including the pgpool directory) during a reboot. You should not use the /var/run/edb/pgpool directory as the location for the pgpool.pid file. Modify the pid_file_name parameter to specify a safer directory location.