WarehousePG Enterprise Manager (WEM) exposes metrics through two separate channels. Exporter collects metrics from the WarehousePG (WHPG) cluster and pushes them to Prometheus via remote write. WEM also exposes an internal metrics endpoint at /prom/metrics that Prometheus consumes directly.

All exporter metrics use the prefix warehousepg_observability_. All WEM internal metrics use the prefix wem_.

Exporter metrics

The Exporter process collects the following metrics and pushes to Prometheus via remote write.

Cluster connectivity

MetricTypeDescription
warehousepg_observability_connectedGaugeIndicates whether the database connection is valid (1=connected, 0=not connected).

Counting segments

MetricTypeDescription
warehousepg_observability_clinfo_total_primary_countGaugeTotal number of primary segments in the cluster.
warehousepg_observability_clinfo_total_mirror_countGaugeTotal number of mirror segments in the cluster.
warehousepg_observability_clinfo_curr_primary_down_countGaugeNumber of primary segments currently down.
warehousepg_observability_clinfo_curr_mirror_down_countGaugeNumber of mirror segments currently down.
warehousepg_observability_clinfo_promoted_mirror_countGaugeNumber of mirror segments that have been promoted to primary.
warehousepg_observability_clinfo_preff_primary_up_countGaugeNumber of segments running on their preferred primary host.
warehousepg_observability_clinfo_curr_mirror_up_countGaugeNumber of mirror segments currently up.
warehousepg_observability_clinfo_primary_not_in_sync_countGaugeNumber of primary segments not in sync with their mirrors.
warehousepg_observability_clinfo_mirror_not_in_sync_countGaugeNumber of mirror segments not in sync with their primaries.

Counting coordinators

MetricTypeDescription
warehousepg_observability_coordinator_total_countGaugeTotal number of coordinator nodes.
warehousepg_observability_coordinator_curr_active_up_countGaugeNumber of active coordinator nodes currently up.
warehousepg_observability_coordinator_curr_active_down_countGaugeNumber of active coordinator nodes currently down.
warehousepg_observability_coordinator_curr_standby_up_countGaugeNumber of standby coordinator nodes currently up.
warehousepg_observability_coordinator_curr_standby_down_countGaugeNumber of standby coordinator nodes currently down.
warehousepg_observability_coordinator_standby_synced_countGaugeNumber of standby coordinator nodes in sync with the active coordinator.
warehousepg_observability_coordinator_standby_not_in_sync_countGaugeNumber of standby coordinator nodes not in sync.
warehousepg_observability_coordinator_failed_over_countGaugeNumber of coordinator nodes that have failed over.

Tracking segment status

MetricTypeDescription
warehousepg_observability_seg_statusGaugePer-segment up/down status (1=up, 0=down).

Monitoring query and connection states

MetricTypeDescription
warehousepg_observability_q_totalGaugeTotal number of connections.
warehousepg_observability_q_activeGaugeNumber of connections with active queries.
warehousepg_observability_q_idleGaugeNumber of idle connections.
warehousepg_observability_q_idle_txnGaugeNumber of connections idle inside an open transaction.
warehousepg_observability_q_idle_txn_abortedGaugeNumber of connections idle inside an aborted transaction.
warehousepg_observability_q_fastpathGaugeNumber of connections executing fast-path function calls.
warehousepg_observability_q_disabledGaugeNumber of disabled connections.
warehousepg_observability_q_blockedGaugeNumber of queries blocked waiting for locks.
warehousepg_observability_q_long_running_120sGaugeNumber of queries that have been running for more than 120 seconds.
warehousepg_observability_q_in_waitGaugeNumber of queries currently in a wait state.
warehousepg_observability_quGaugeNumber of connections per database user.
warehousepg_observability_txn_total_queries_executedGaugeTotal number of queries executed since the last collection interval.

Tracking database sizes

MetricTypeDescription
warehousepg_observability_db_sizeGaugeSize of each database in bytes.

Tracking spill activity

MetricTypeDescription
warehousepg_observability_spill_files_totalGaugeTotal number of spill files per database.
warehousepg_observability_spill_bytes_totalGaugeTotal bytes spilled per database.
warehousepg_observability_spill_files_total_by_segmentGaugeTotal number of spill files per segment.
warehousepg_observability_spill_bytes_total_by_segmentGaugeTotal bytes spilled per segment.

Monitoring resource groups

MetricTypeDescription
warehousepg_observability_resgroup_status_concurrencyGaugeConfigured concurrency limit for the resource group.
warehousepg_observability_resgroup_status_cpu_max_ptGaugeMaximum CPU percentage limit for the resource group.
warehousepg_observability_resgroup_status_running_queriesGaugeNumber of queries currently running in the resource group.
warehousepg_observability_resgroup_status_queued_queriesGaugeNumber of queries currently queued in the resource group.
warehousepg_observability_resgroup_status_total_queuedCounterTotal number of queries ever queued in the resource group.
warehousepg_observability_resgroup_status_total_executedCounterTotal number of queries ever executed in the resource group.
warehousepg_observability_resgroup_status_queue_time_secondsCounterCumulative time queries have spent queued in the resource group, in seconds.
warehousepg_observability_resgroup_host_cpu_usageGaugeCPU usage for a resource group on a specific host.
warehousepg_observability_resgroup_host_memory_usageGaugeMemory usage for a resource group on a specific host.

Collecting host hardware metrics

These metrics reflect the physical resources of each cluster host.

MetricTypeDescription
warehousepg_observability_node_cpu_seconds_totalCounterCPU time accumulated per mode (user, system, idle, iowait).
warehousepg_observability_node_memory_MemTotal_bytesGaugeTotal physical memory on the host, in bytes.
warehousepg_observability_node_memory_MemAvailable_bytesGaugeAvailable memory on the host, in bytes.
warehousepg_observability_node_memory_Cached_bytesGaugeMemory used by the OS page cache, in bytes.
warehousepg_observability_node_disk_read_bytes_totalCounterTotal bytes read from disk.
warehousepg_observability_node_disk_written_bytes_totalCounterTotal bytes written to disk.
warehousepg_observability_node_network_receive_bytes_totalCounterTotal bytes received over the network.
warehousepg_observability_node_network_transmit_bytes_totalCounterTotal bytes transmitted over the network.
warehousepg_observability_node_load1Gauge1-minute load average.
warehousepg_observability_node_load5Gauge5-minute load average.
warehousepg_observability_node_load15Gauge15-minute load average.

WEM internal metrics

These metrics describe the health and behavior of WEM itself. Prometheus scrapes them directly from the /prom/metrics endpoint.

Monitoring canary checks

MetricTypeDescription
wem_canary_duration_msGaugeExecution time of the most recent canary check run, in milliseconds.
wem_canary_statusGaugeResult of the most recent canary check run (0=success, 1=warning, 2=critical).
wem_canary_last_run_timestampGaugeUnix timestamp of the most recent canary check execution.
wem_canary_row_countGaugeRow count returned by the most recent canary check query.
wem_canary_checks_totalCounterTotal number of canary check executions since WEM started.
wem_canary_failures_totalCounterTotal number of canary check executions that returned a warning or critical result.

Tracking WEM system state

MetricTypeDescription
wem_upGaugeIndicates whether the WEM process is running (1=running).
wem_scheduler_lock_heldGaugeIndicates whether this WEM instance currently holds the scheduler lock and is actively running canary checks (1=held).
wem_build_infoGaugeStatic build metadata for this WEM instance. Always 1.

Monitoring connection pools

MetricTypeDescription
wem_pool_total_connsGaugeTotal number of connections in the pool (idle and acquired).
wem_pool_idle_connsGaugeNumber of idle connections available in the pool.
wem_pool_acquired_connsGaugeNumber of connections currently in use.
wem_pool_max_connsGaugeMaximum number of connections configured for the pool.
wem_pool_utilization_percentGaugePool utilization expressed as a percentage of the maximum connection limit.

Could this page be better? Report a problem or suggest an addition!