Metrics
WarehousePG Enterprise Manager (WEM) exposes metrics through two separate channels. Exporter collects metrics from the WarehousePG (WHPG) cluster and pushes them to Prometheus via remote write. WEM also exposes an internal metrics endpoint at /prom/metrics that Prometheus consumes directly.
All exporter metrics use the prefix warehousepg_observability_. All WEM internal metrics use the prefix wem_.
Exporter metrics
The Exporter process collects the following metrics and pushes to Prometheus via remote write.
Cluster connectivity
| Metric | Type | Description |
|---|---|---|
warehousepg_observability_connected | Gauge | Indicates whether the database connection is valid (1=connected, 0=not connected). |
Counting segments
| Metric | Type | Description |
|---|---|---|
warehousepg_observability_clinfo_total_primary_count | Gauge | Total number of primary segments in the cluster. |
warehousepg_observability_clinfo_total_mirror_count | Gauge | Total number of mirror segments in the cluster. |
warehousepg_observability_clinfo_curr_primary_down_count | Gauge | Number of primary segments currently down. |
warehousepg_observability_clinfo_curr_mirror_down_count | Gauge | Number of mirror segments currently down. |
warehousepg_observability_clinfo_promoted_mirror_count | Gauge | Number of mirror segments that have been promoted to primary. |
warehousepg_observability_clinfo_preff_primary_up_count | Gauge | Number of segments running on their preferred primary host. |
warehousepg_observability_clinfo_curr_mirror_up_count | Gauge | Number of mirror segments currently up. |
warehousepg_observability_clinfo_primary_not_in_sync_count | Gauge | Number of primary segments not in sync with their mirrors. |
warehousepg_observability_clinfo_mirror_not_in_sync_count | Gauge | Number of mirror segments not in sync with their primaries. |
Counting coordinators
| Metric | Type | Description |
|---|---|---|
warehousepg_observability_coordinator_total_count | Gauge | Total number of coordinator nodes. |
warehousepg_observability_coordinator_curr_active_up_count | Gauge | Number of active coordinator nodes currently up. |
warehousepg_observability_coordinator_curr_active_down_count | Gauge | Number of active coordinator nodes currently down. |
warehousepg_observability_coordinator_curr_standby_up_count | Gauge | Number of standby coordinator nodes currently up. |
warehousepg_observability_coordinator_curr_standby_down_count | Gauge | Number of standby coordinator nodes currently down. |
warehousepg_observability_coordinator_standby_synced_count | Gauge | Number of standby coordinator nodes in sync with the active coordinator. |
warehousepg_observability_coordinator_standby_not_in_sync_count | Gauge | Number of standby coordinator nodes not in sync. |
warehousepg_observability_coordinator_failed_over_count | Gauge | Number of coordinator nodes that have failed over. |
Tracking segment status
| Metric | Type | Description |
|---|---|---|
warehousepg_observability_seg_status | Gauge | Per-segment up/down status (1=up, 0=down). |
Monitoring query and connection states
| Metric | Type | Description |
|---|---|---|
warehousepg_observability_q_total | Gauge | Total number of connections. |
warehousepg_observability_q_active | Gauge | Number of connections with active queries. |
warehousepg_observability_q_idle | Gauge | Number of idle connections. |
warehousepg_observability_q_idle_txn | Gauge | Number of connections idle inside an open transaction. |
warehousepg_observability_q_idle_txn_aborted | Gauge | Number of connections idle inside an aborted transaction. |
warehousepg_observability_q_fastpath | Gauge | Number of connections executing fast-path function calls. |
warehousepg_observability_q_disabled | Gauge | Number of disabled connections. |
warehousepg_observability_q_blocked | Gauge | Number of queries blocked waiting for locks. |
warehousepg_observability_q_long_running_120s | Gauge | Number of queries that have been running for more than 120 seconds. |
warehousepg_observability_q_in_wait | Gauge | Number of queries currently in a wait state. |
warehousepg_observability_qu | Gauge | Number of connections per database user. |
warehousepg_observability_txn_total_queries_executed | Gauge | Total number of queries executed since the last collection interval. |
Tracking database sizes
| Metric | Type | Description |
|---|---|---|
warehousepg_observability_db_size | Gauge | Size of each database in bytes. |
Tracking spill activity
| Metric | Type | Description |
|---|---|---|
warehousepg_observability_spill_files_total | Gauge | Total number of spill files per database. |
warehousepg_observability_spill_bytes_total | Gauge | Total bytes spilled per database. |
warehousepg_observability_spill_files_total_by_segment | Gauge | Total number of spill files per segment. |
warehousepg_observability_spill_bytes_total_by_segment | Gauge | Total bytes spilled per segment. |
Monitoring resource groups
| Metric | Type | Description |
|---|---|---|
warehousepg_observability_resgroup_status_concurrency | Gauge | Configured concurrency limit for the resource group. |
warehousepg_observability_resgroup_status_cpu_max_pt | Gauge | Maximum CPU percentage limit for the resource group. |
warehousepg_observability_resgroup_status_running_queries | Gauge | Number of queries currently running in the resource group. |
warehousepg_observability_resgroup_status_queued_queries | Gauge | Number of queries currently queued in the resource group. |
warehousepg_observability_resgroup_status_total_queued | Counter | Total number of queries ever queued in the resource group. |
warehousepg_observability_resgroup_status_total_executed | Counter | Total number of queries ever executed in the resource group. |
warehousepg_observability_resgroup_status_queue_time_seconds | Counter | Cumulative time queries have spent queued in the resource group, in seconds. |
warehousepg_observability_resgroup_host_cpu_usage | Gauge | CPU usage for a resource group on a specific host. |
warehousepg_observability_resgroup_host_memory_usage | Gauge | Memory usage for a resource group on a specific host. |
Collecting host hardware metrics
These metrics reflect the physical resources of each cluster host.
| Metric | Type | Description |
|---|---|---|
warehousepg_observability_node_cpu_seconds_total | Counter | CPU time accumulated per mode (user, system, idle, iowait). |
warehousepg_observability_node_memory_MemTotal_bytes | Gauge | Total physical memory on the host, in bytes. |
warehousepg_observability_node_memory_MemAvailable_bytes | Gauge | Available memory on the host, in bytes. |
warehousepg_observability_node_memory_Cached_bytes | Gauge | Memory used by the OS page cache, in bytes. |
warehousepg_observability_node_disk_read_bytes_total | Counter | Total bytes read from disk. |
warehousepg_observability_node_disk_written_bytes_total | Counter | Total bytes written to disk. |
warehousepg_observability_node_network_receive_bytes_total | Counter | Total bytes received over the network. |
warehousepg_observability_node_network_transmit_bytes_total | Counter | Total bytes transmitted over the network. |
warehousepg_observability_node_load1 | Gauge | 1-minute load average. |
warehousepg_observability_node_load5 | Gauge | 5-minute load average. |
warehousepg_observability_node_load15 | Gauge | 15-minute load average. |
WEM internal metrics
These metrics describe the health and behavior of WEM itself. Prometheus scrapes them directly from the /prom/metrics endpoint.
Monitoring canary checks
| Metric | Type | Description |
|---|---|---|
wem_canary_duration_ms | Gauge | Execution time of the most recent canary check run, in milliseconds. |
wem_canary_status | Gauge | Result of the most recent canary check run (0=success, 1=warning, 2=critical). |
wem_canary_last_run_timestamp | Gauge | Unix timestamp of the most recent canary check execution. |
wem_canary_row_count | Gauge | Row count returned by the most recent canary check query. |
wem_canary_checks_total | Counter | Total number of canary check executions since WEM started. |
wem_canary_failures_total | Counter | Total number of canary check executions that returned a warning or critical result. |
Tracking WEM system state
| Metric | Type | Description |
|---|---|---|
wem_up | Gauge | Indicates whether the WEM process is running (1=running). |
wem_scheduler_lock_held | Gauge | Indicates whether this WEM instance currently holds the scheduler lock and is actively running canary checks (1=held). |
wem_build_info | Gauge | Static build metadata for this WEM instance. Always 1. |
Monitoring connection pools
| Metric | Type | Description |
|---|---|---|
wem_pool_total_conns | Gauge | Total number of connections in the pool (idle and acquired). |
wem_pool_idle_conns | Gauge | Number of idle connections available in the pool. |
wem_pool_acquired_conns | Gauge | Number of connections currently in use. |
wem_pool_max_conns | Gauge | Maximum number of connections configured for the pool. |
wem_pool_utilization_percent | Gauge | Pool utilization expressed as a percentage of the maximum connection limit. |
- On this page
- Exporter metrics
- WEM internal metrics
Could this page be better? Report a problem or suggest an addition!