Cluster topology v10.5

Cluster topology provides a visual representation of your PostgreSQL replication clusters. It automatically discovers which database servers are connected through streaming replication and displays them in an interactive diagram showing:

  • Primary servers — The source of replication.
  • Replica servers — Servers receiving replicated data.
  • Replication relationships — Which servers replicate from which.
  • Sync states — Whether replication is synchronous or asynchronous.
  • Cascading replication — Replicas that replicate from other replicas.
  • PGD/BDR clusters — Mesh replication where all nodes replicate to all others.
  • Witness nodes — Non-data nodes used for consensus in high availability (HA) setups.

How topology discovery works

PEM discovers replication topology using dedicated probes that collect data from your PostgreSQL servers. The probes gather raw replication data, which is then processed by a correlation function to build the cluster topology.

Note

The topology view is updated when you click Refresh, which triggers the correlation function to process the latest probe data and update the cluster information.

Required probes

For the topology feature to work correctly, enable the following probes on your monitored servers:

Probe nameRequired forDescription
Streaming ReplicationAll cluster typesCollects replication data from pg_stat_replication (on primary) and pg_stat_wal_receiver (on standbys). Required for all topology discovery.
PGD Group Summary (bdr_group_summary)PGD clustersCollects PGD group-level information.
PGD Node Summary (bdr_node_summary)PGD clustersCollects PGD node information including node names, subgroups, and write leader status.
EFM Cluster Node StatusEFM clustersCollects EFM node data via the efm_cluster_node_status probe.

These probes are enabled by default on new server registrations. If topology data isn't appearing, verify that these probes are enabled in Management > Probe Configuration.

Cluster type detection

PEM automatically detects the cluster type based on server configuration:

Cluster typeHow it's detected
Physical Streaming Replication (PSR)Default for servers with streaming replication and no HA manager configured.
PGDServers with the BDR extension installed (detected via PGD probes).
PatroniServers with Patroni Cluster Name configured in server properties.
EDB Failover Manager (EFM)Servers with EFM Cluster Name configured in server properties.

To enable Patroni or EFM cluster detection, configure the respective cluster name in the server's properties when registering or editing the server in PEM.

Accessing the topology view

You can open the cluster topology view in two ways:

  • Click the three dots (⋯) in the top-right corner of the panel header and select Open > Cluster Topology.
  • Right-click a server in the object tree and select Show Cluster Topology.

Required permissions

To view cluster topology, users need the pem_topology_view role. Contact your PEM administrator to grant this permission.

Understanding the topology display

Cluster list (left sidebar)

The sidebar shows all discovered replication clusters:

ElementDescription
Cluster nameAuto-generated or primary server description.
Type badgePSR, Patroni, EFM, or PGD.
Member countNumber of servers in the cluster.

Supported cluster types

TypeDescription
PSRPhysical Streaming Replication (standard PostgreSQL).
PatroniPatroni-managed HA clusters.
EFMEDB Failover Manager clusters.
PGDPostgres Distributed (BDR) with mesh replication.

Topology graph (main panel)

The interactive diagram shows nodes and the connections between them.

Node types

Node colorDescription
Green borderPrimary — The source server providing replication.
Blue borderReplica — Server receiving replicated data.
Purple borderData node — PGD data node that accepts reads and writes.
Orange borderWitness — Non-data node for HA consensus.
Gray borderUnknown — Unregistered replica (shows IP address).

Each node displays the server name or description, hostname and port, a role badge (Primary/Replica), and a sync state badge for replicas.

The legend in the upper-right corner of the diagram explains the color coding.

Interactive SVG view

The topology visualization uses an interactive scalable vector graphics (SVG) view with the following features.

Pan and zoom

  • Zoom in/out — Use your mouse scroll wheel.
  • Pan — Click and drag anywhere on the canvas.
  • Zoom controls — Use the + and buttons in the bottom-right corner. The Fit to screen button resets the view to fit all nodes in the viewport.

Fullscreen mode

Click the Fullscreen button in the header to expand the view. This is useful for presenting topology to team members, viewing large clusters, or taking screenshots for documentation. Press Escape or click the button again to exit.

SVG export

Click the Download button in the header to export the current topology view as an SVG file.

Node card details

Each server is displayed as a card with the following information:

  • Icon — Storage icon for registered servers, help icon for unregistered servers.
  • Server name — The server description or hostname.
  • Role badgeP (Primary), R (Replica), D (Data Node for PGD), W (Write Leader or Witness), ? (Unknown).
  • Hostname:Port — The server's connection information.
  • Tags — Server tags such as "Write Leader" and "Raft_Leader" displayed as pill-shaped badges.
  • Status — "Not Registered" indicator for servers not registered in PEM.

Renderer preference

PEM offers two rendering modes. Switch between them in Preferences > Cluster Topology > Renderer:

RendererDescription
SVG (default)Interactive SVG view with pan/zoom and smooth tooltips.
HTMLClassic HTML/CSS rendering without pan/zoom.

PGD/BDR clusters

Postgres Distributed (PGD), also known as BDR (Bi-Directional Replication), is a multimaster replication solution where all nodes can accept writes. Unlike traditional primary-replica setups, PGD uses mesh replication where every node replicates to every other node.

Polygon and force graph layouts

Polygon layout (1–8 nodes)

For PGD clusters with 1–8 data nodes, PEM uses a polygon mesh layout — a single centered card for 1 node, a horizontal pair for 2, and polygon formations (triangle, square, pentagon, and so on) for 3–8 nodes. Dashed lines connect all nodes to visualize the mesh replication topology.

Force graph layout (9+ nodes)

For larger PGD clusters with 9 or more data nodes, PEM switches to a force graph layout. Nodes are displayed as colored circles with numbered labels. Hover over a node to see full details in a tooltip. Write leaders are highlighted with an outer ring, and sparse mesh lines show connectivity without visual clutter.

Subgroups

PGD clusters can organize nodes into subgroups (also called locations), typically representing geographic data centers or availability zones. When a PGD cluster has multiple subgroups, PEM displays them as separate bordered cards with intersubgroup dashed lines connecting the subgroup centers. For 3 or more subgroups, cards are arranged in a circle.

If all nodes belong to the same subgroup or no subgroup is defined, PEM displays the nodes directly using the standard polygon or force graph layout.

Working with unregistered replicas

When PEM discovers a replica that isn't registered as a PEM-monitored server, it appears as an "Unknown" node showing the IP address and a "Not Registered" label inside the card.

To resolve this, register the replica server in PEM.

Refreshing topology data

The Streaming Replication probe runs every 5 minutes by default, collecting the latest replication data from your servers. This raw data needs to be processed by the correlation function to update the topology view.

Click Refresh in the topology view header to trigger the correlation function, update the cluster list with any new or changed clusters, and refresh the current topology graph. Always click Refresh after making changes to your replication setup — such as adding replicas or performing failovers — to see the updated topology.

Troubleshooting

No clusters appear

Possible causes:

  • The Streaming Replication probe hasn't run yet — wait 5 minutes or trigger a manual refresh.
  • No streaming replication is configured on monitored servers.
  • The streaming_replication probe is disabled.

Check that the probe is enabled in Probe Configuration, verify that pg_stat_replication returns data on your primary servers, and then manually trigger a refresh.

Server shows as "Unknown"

The replica's IP address doesn't match any registered server. Register the server in PEM, or ensure the server's server or hostaddr field matches the IP seen in replication.

Topology graph is empty

Possible causes:

  • The selected cluster has no active members.
  • All members have become inactive.

Check that the servers are running and agents are connected, then verify the probe is collecting data:

SELECT * FROM pemdata.streaming_replication;

PGD cluster not detected

The PGD Node Summary (bdr_node_summary) probe isn't collecting data. Verify the probe is enabled for the PGD servers, check that the server has BDR installed and configured, and query the probe data:

SELECT * FROM pemdata.bdr_node_summary WHERE server_id = <id>;

PGD node shows as "Unregistered"

The BDR node name doesn't match any registered server's description or hostname. Check the node name in the PGD Node Summary (bdr_node_summary) probe:

SELECT node_name FROM pemdata.bdr_node_summary;

Then update the server description to match the node name exactly, or register a new server with the correct node name as the description.

Best practices

  • Register all servers — For the most accurate topology view, register all servers including replicas. Unregistered nodes show with a help icon and "Not Registered" status.
  • Use consistent naming — Give servers meaningful descriptions so the topology diagram is easy to understand.
  • For PGD clusters — Set server descriptions to match BDR node names exactly for proper node matching.
  • Monitor probe status — Ensure the streaming_replication probe is enabled and running.
  • Check after failovers — After a planned or unplanned failover, refresh the topology to verify the new configuration.
  • Use SVG export for documentation — Export the topology as SVG for including in architecture documentation or presentations.
  • Use fullscreen for large clusters — For clusters with many nodes, fullscreen mode provides a better overview.
  • Zoom for details — Use the zoom controls to examine individual node details or get a bird's-eye view of the entire cluster.
  • Check node tooltips — In force graph layouts, hover over nodes to see detailed information including hostname, port, role, and tags.

SQL queries for topology data

If you prefer querying topology data directly via SQL — for scripting, reporting, or integration with other tools — run the following queries against the PEM database.

List all discovered clusters

SELECT
    c.id AS cluster_id,
    c.name AS cluster_name,
    c.cluster_type,
    ps.description AS primary_server,
    ps.server AS primary_host,
    (SELECT COUNT(*) FROM pem.replication_cluster_member m
     WHERE m.cluster_id = c.id AND m.is_active = true) AS active_members,
    c.updated_at AS last_updated
FROM pem.replication_cluster c
LEFT JOIN pem.server ps ON ps.id = c.primary_server_id
ORDER BY c.name;

View cluster members with roles

SELECT
    c.name AS cluster_name,
    c.cluster_type,
    m.role,
    COALESCE(s.description, m.client_hostname, host(m.client_addr)::text) AS node_name,
    s.server AS hostname,
    s.port,
    m.sync_state,
    m.cascade_level,
    m.subgroup_name,
    CASE WHEN m.server_id IS NOT NULL THEN 'Yes' ELSE 'No' END AS registered_in_pem,
    m.updated_at
FROM pem.replication_cluster_member m
JOIN pem.replication_cluster c ON c.id = m.cluster_id
LEFT JOIN pem.server s ON s.id = m.server_id
WHERE m.is_active = true
ORDER BY c.name, m.cascade_level, m.role DESC;

View replication lag (PSR, EFM, and Patroni clusters)

SELECT
    c.name AS cluster_name,
    ps.description AS primary_server,
    sr.client_addr AS replica_address,
    COALESCE(rs.description, sr.client_addr) AS replica_name,
    m2.sync_state,
    sr.lag_mb,
    sr.recorded_time AS last_collected
FROM pemdata.streaming_replication sr
JOIN pem.replication_cluster_member m
    ON m.server_id = sr.server_id AND m.role = 'primary' AND m.is_active = true
JOIN pem.replication_cluster c ON c.id = m.cluster_id
JOIN pem.server ps ON ps.id = sr.server_id
LEFT JOIN pem.server rs ON host(rs.server::inet) = host(sr.client_addr::inet)
LEFT JOIN pem.replication_cluster_member m2
    ON m2.cluster_id = c.id AND host(m2.client_addr) = host(sr.client_addr::inet)
    AND m2.is_active = true
WHERE c.cluster_type IN ('psr', 'efm', 'patroni')
  AND sr.client_addr IS NOT NULL
ORDER BY c.name, sr.lag_mb DESC;

View replication lag (PGD clusters)

SELECT
    c.name AS cluster_name,
    ps.description AS write_leader,
    bns.target_name AS peer_node,
    bns.client_addr AS peer_address,
    bns.active_pid AS slot_pid,
    bns.replay_lag_bytes,
    ROUND(bns.replay_lag_bytes / (1024.0 * 1024), 2) AS lag_mb,
    bns.replay_lag_size AS lag_pretty,
    bns.recorded_time AS last_collected
FROM pemdata.bdr_node_slots bns
JOIN pem.replication_cluster_member m
    ON m.server_id = bns.server_id AND m.role = 'primary' AND m.is_active = true
JOIN pem.replication_cluster c ON c.id = m.cluster_id
JOIN pem.server ps ON ps.id = bns.server_id
WHERE c.cluster_type = 'pgd'
  AND bns.target_name IS NOT NULL
ORDER BY c.name, lag_mb DESC;

Complete topology overview (all cluster types)

This query gives a unified view of all clusters, members, and replication lag:

WITH cluster_lag AS (
    -- PSR/EFM/Patroni lag from streaming_replication
    SELECT
        m.cluster_id,
        host(sr.client_addr::inet) AS replica_addr,
        sr.lag_mb
    FROM pemdata.streaming_replication sr
    JOIN pem.replication_cluster_member m
        ON m.server_id = sr.server_id AND m.role = 'primary' AND m.is_active = true
    JOIN pem.replication_cluster c ON c.id = m.cluster_id
    WHERE c.cluster_type IN ('psr', 'efm', 'patroni')
      AND sr.client_addr IS NOT NULL

    UNION ALL

    -- PGD lag from bdr_node_slots
    SELECT
        m.cluster_id,
        host(bns.client_addr::inet) AS replica_addr,
        ROUND(bns.replay_lag_bytes / (1024.0 * 1024), 2) AS lag_mb
    FROM pemdata.bdr_node_slots bns
    JOIN pem.replication_cluster_member m
        ON m.server_id = bns.server_id AND m.role = 'primary' AND m.is_active = true
    JOIN pem.replication_cluster c ON c.id = m.cluster_id
    WHERE c.cluster_type = 'pgd'
      AND bns.target_name IS NOT NULL
)
SELECT
    c.name AS cluster_name,
    c.cluster_type,
    m.role,
    COALESCE(s.description, m.client_hostname, host(m.client_addr)::text) AS node_name,
    s.server AS hostname,
    s.port,
    m.sync_state,
    m.subgroup_name,
    COALESCE(cl.lag_mb, 0) AS lag_mb,
    CASE WHEN m.server_id IS NOT NULL THEN 'Yes' ELSE 'No' END AS registered
FROM pem.replication_cluster_member m
JOIN pem.replication_cluster c ON c.id = m.cluster_id
LEFT JOIN pem.server s ON s.id = m.server_id
LEFT JOIN cluster_lag cl
    ON cl.cluster_id = m.cluster_id
    AND cl.replica_addr = COALESCE(host(m.client_addr), s.server)
WHERE m.is_active = true
ORDER BY c.name, m.cascade_level, m.role DESC;