Verifying the cluster health

Suggest edits

Ensure your WarehousePG (WHPG) environment remains available and efficient by monitoring real-time health metrics and resource utilization. To verify your cluster configuration and current status, select the Cluster panel from the left sidebar.

Confirming core cluster availability

To ensure consistent database access for your applications, verify that your cluster is online and responding to requests. Monitoring node health and uptime allows you to identify potential service interruptions before they impact users.

Verify the Cluster Status panel shows as Healthy. If the state is Degraded, it indicates that one or more segments have failed or synchronization is lagging. Identify the affected components in the Segment Details table.
Ensure the count of Up segments matches your total segment count. If any segments are Down, your cluster is at risk of data loss or reduced performance. Locate the affected components in the Segment Details table.
Confirm the primary coordinator host is up and the standby is Synchronized. If the standby appears as Not Synced, a failover event can result in data loss or extended downtime. To resolve synchronization issues, check the network connectivity or restart the standby process.
Compare active connections against the maximum limit. If connections are near the ceiling, new application requests will be rejected. To prevent rejection, terminate idle sessions or increase the max_connections configuration parameter.

Resolving cluster availability issues

Verify that your cluster is online and responding to requests. Use the Cluster panel to monitor node health and uptime. If you identify a problem in the summary cards, follow these steps to restore service:

Confirm the Up count matches your total segment configuration. If segments are Down, search the Segment Details table for the specific primary or mirror nodes that are offline.
Review the Hostname and Port columns in the Segment Details table. If multiple failed segments share a hostname, the physical host likely has a hardware or network issue. If the any hosts itself are unreachable, reboot the host or resolve the network outage.
Once you identify the failed segments, run gprecoverseg from the command line to return them to service. See Recovering from segment failures for details on segment recovery.
Confirm the primary coordinator is up and the standby is Synchronized. If the standby appears as Not Synced, restart the standby process to prevent data loss during a failover event. See Enabling coordinator mirroring for details.
If active connections are near the maximum limit, new application requests will fail. Terminate idle sessions or increase the max_connections parameter to prevent service rejection.

Analyzing database utilization

Prevent a single database or tenant from impacting the overall cluster performance by identifying resource outliers. Monitoring individual database metrics allows you to balance the load and ensure fair access to storage and connections.

Compare database sizes to identify rapidly growing databases that might require storage expansion or data vacuuming. If a database consumes excessive space, perform a VACUUM operation to reclaim storage or plan a disk expansion before the volume reaches capacity.
Monitor connection counts per database to identify unauthorized access or application connection leaks. If a single database shows an unusual spike in sessions, investigate the application for connection leaks, terminate idle sessions, or implement a connection pooler.
Verify that only authorized databases are active by checking the Database list. If you find an unknown database consuming resources, contact the owner or drop the database to free up cluster capacity.

Could this page be better? Report a problem or suggest an addition!