Monitor Hybrid Manager cluster state
This how-to guide explains how to monitor the state of the Hybrid Manager (HM) cluster and its key components.
This is useful for:
- Understanding current system health
- Monitoring ongoing Hybrid Manager operations
- Establishing baseline cluster state
- Supporting troubleshooting workflows
- Performing periodic system checks
Important: This guide covers monitoring Hybrid Manager platform components — not the managed Postgres clusters.
What to monitor
Monitor the following aspects of the Hybrid Manager environment:
- Hybrid Manager UI/API availability and responsiveness
- Operator health and reconciliation activity
- Backup agent and Transporter job status
- Storage location operator status
- Beacon and telemetry activity
- Log pipeline health (Loki / Fluent Bit)
- Kubernetes cluster resource health (nodes, Pods, PVCs, Services)
Key tools and data sources
Use the following tools:
- Grafana dashboards → metrics and high-level system views
- Prometheus → raw metrics exploration
- Loki (via Grafana Explore) → component logs
- kubectl → Kubernetes component status and Events
- Cloud provider dashboards → LoadBalancer health, storage health (optional)
Grafana dashboards to monitor
Recommended dashboards:
- Kubernetes cluster dashboard → node health, resource usage, Pod health
- Hybrid Manager platform dashboard → UI/API availability, operator metrics, Transporter metrics
- Postgres operator dashboard → operator health, reconciliation metrics
- Transporter dashboard → backup and data movement metrics
- Beacon/Telemetry dashboard → data flow to observability stack
Work with your platform team to validate and tune these dashboards.
Key Kubernetes checks
Use kubectl to monitor component health:
kubectl get pods -A→ look for Pending, CrashLoopBackOff, Error stateskubectl get nodes→ validate node Ready status and resource pressurekubectl get pvc -A→ monitor PVC capacity and binding statuskubectl get svc -A→ validate LoadBalancer and Service endpoints
Key log checks
Use Loki and/or kubectl logs:
- Look for recent errors in API, UI, operator, Transporter, and Beacon logs.
- Validate regular reconciliation activity in operator logs.
- Validate successful backup and Transporter job completion.
- Look for errors in Fluent Bit or Loki components (log pipeline health).
Recommended monitoring cadence
- Ongoing → dashboard and alert monitoring
- Daily checks → Pod health, node health, key component logs
- Pre/post change → baseline and compare component state
- Periodic validation → test full monitoring pipeline (logs, metrics, dashboards)
Summary checklist
- Use Grafana dashboards to monitor overall Hybrid Manager platform state.
- Use Prometheus for detailed metric validation.
- Use Loki and
kubectl logsfor log-based troubleshooting. - Monitor Kubernetes node, Pod, PVC, and Service health.
- Establish baseline state and monitor deviations.
- Perform regular cluster state validation to detect emerging issues.
Related topics
Could this page be better? Report a problem or suggest an addition!