Maintenance commands through proxies v5

Maintenance and performance

As a general rule, you should never perform maintenance operations on a cluster's write leader. Maintenance operations such as VACUUM can be quite disruptive to the smooth running of a busy server and often detrimental to workload performance. Therefore, it's best to run maintenance commands on any node in a group that isn't the write leader. Generally, this requires you to connect directly and issue the maintenance commands on the non-write-leader nodes. But in some situations, this isn't possible.

Maintenance and proxies

Proxies, by design, always connect to and send commands to the current write leader. This usually means that you must not connect by way of a proxy to perform maintenance. PGD clusters nodes can present a direct connection for psql and PGD CLI clients that you can use to issue maintenance commands to the server on those nodes. But there are environment in which the PGD cluster is deployed where a proxy is the only way to access the cluster.

For example, in BigAnimal, PGD clusters are locked down such that the only access to the database is through an instance of PGD Proxy. This configuration reduces the footprint of the cluster and makes it more secure. However, it requires that you use a different way of sending maintenance requests to the cluster's nodes.

The technique outlined here is generally useful for despatching commands to specific nodes without being directly connected to that node's server.

Maintenance commands

The term maintenance commands refers to:

  • Non-replicated DDL commands (which you might want to manually replicate)

A note on node names

The servers in the cluster are referred to by their PGD cluster node names. To get a list of node names in your cluster, use:

select node_name from bdr.node;

For more details, see the bdr.node table.

This command lists just the node names. If you need to know the group they are a member of, use:

select node_name, node_group_name from bdr.node_summary;

For more details, see the bdr.node_summary table.

Finding the write leader

If you're connected through the proxy, then you're connected to the write leader. Run select node_name from bdr.local_node_summary to see the name of the node:

select node_name from bdr.local_node_summary;
(1 row)

This is the node you do not want to run your maintenance tasks on.

select * from bdr.node_group_routing_summary;
 node_group_name |    write_lead    | previous_write_lead |       read_nodes
 dc1             | node-two         | node-one            | {node-one,node-three}

Where the write_lead is the node determined earlier (node-two), you can also see the two read_nodes (node-one and node-three). It's on these nodes that you can safely perform maintenance.


You can perform that operation with a single query:

select read_nodes from bdr.node_group_routing_summary where write_lead = (select node_name from bdr.local_node_summary);

Using bdr.run_on_nodes()

PGD has the ability to run specific commands on specific nodes using the bdr.run_on_nodes() function. This function takes two parameters: an array of node names and the command you want to run on those nodes. For example:

SELECT bdr.run_on_nodes(ARRAY['node-one','node-three'],'vacuum full foo');
 [{"dsn": "host=host-one port=5444 dbname=bdrdb", "node_id": "807899305", "response": {"command_status": "VACUUM"}, "node_name": "node-one", "query_send_time": "2024-01-16 16:24:35.418323+00"}, {"dsn": "host=host-three port=5432 dbname=bdrdb", "node_id": "199017004", "response": {"command_status": "VACUUM"}, "node_name": "node", "query_send_time": "2024-01-16 16:24:35.4542+00"}]

This command runs the vacuum full foo command on the node-one and node-three nodes. The node names are passed to the function in an array.

The bdr.run_on_nodes function reports its results as JSONB. The results include the name of the node and the response (or error message) resulting from running the command. Other fields included might be include and might not be relevant.

The results also appear as a single string that's hard to read. By applying some formatting to this string, it can become more readable.

Formatting bdr.run_on_nodes() output

Using Postgres's JSON expressions, you can reduce the output to just the columns you're interested in. The following command is functionally equivalent to the previous example but lists only the node and response as its results:

select q->>'node_name' as node, q->>'response' as response FROM jsonb_array_elements(bdr.run_on_nodes(ARRAY['node-one','node-three'], 'VACUUM FULL foo')) q;
         node     |       	response
 node-one         | {"command_status": "VACUUM"}
 node-three       | {"command_status": "VACUUM"}

If an error occurs, the command_status field is set to error. An additional error_message value is included in the response. For example:

select q->>'node_name' as node, q->>'response' as response FROM jsonb_array_elements(bdr.run_on_nodes(ARRAY['node-one','node-three'], 'VACUUM FULL fool')) q;
   	node   	      |                                      	response
 node-one         | {"error_message": "ERROR:  relation \"fool\" does not exist\n", "command_status": "ERROR"}
 node-three       | {"error_message": "ERROR:  relation \"fool\" does not exist\n", "command_status": "ERROR"}
(2 rows)

Defining a function for maintenance

If you find yourself regularly issuing maintenance commands to one node at a time, you can define a function to simplify things:

create or replace function runmaint(nodename varchar, command varchar) returns TABLE(node text,response jsonb) as $$
return query
select (q->>'node_name')::text, (q->'response') from jsonb_array_elements(bdr.run_on_nodes(ARRAY [nodename], command)) as q;
$$ language 'plpgsql';

This function takes a node name and a command and runs the command on that node, returning the results as shown in this interaction:

select runmaint('node-one','VACUUM FULL foo');
 (node-one,"{""command_status"": ""VACUUM""}")

You can break up the response by using select * from:

select * from runmaint('node-one','VACUUM FULL foo');
   	node          |       	response
 node-one         | {"command_status": "VACUUM"}
(1 row)