Monitor Hybrid Manager cluster state

This how-to guide explains how to monitor the state of the Hybrid Manager (HM) cluster and its key components.

This is useful for:

  • Understanding current system health
  • Monitoring ongoing Hybrid Manager operations
  • Establishing baseline cluster state
  • Supporting troubleshooting workflows
  • Performing periodic system checks

Important: This guide covers monitoring Hybrid Manager platform components — not the managed Postgres clusters.

What to monitor

Monitor the following aspects of the Hybrid Manager environment:

  • Hybrid Manager UI/API availability and responsiveness
  • Operator health and reconciliation activity
  • Backup agent and Transporter job status
  • Storage location operator status
  • Beacon and telemetry activity
  • Log pipeline health (Loki / Fluent Bit)
  • Kubernetes cluster resource health (nodes, Pods, PVCs, Services)

Key tools and data sources

Use the following tools:

  • Grafana dashboards → metrics and high-level system views
  • Prometheus → raw metrics exploration
  • Loki (via Grafana Explore) → component logs
  • kubectl → Kubernetes component status and Events
  • Cloud provider dashboards → LoadBalancer health, storage health (optional)

Grafana dashboards to monitor

Recommended dashboards:

  • Kubernetes cluster dashboard → node health, resource usage, Pod health
  • Hybrid Manager platform dashboard → UI/API availability, operator metrics, Transporter metrics
  • Postgres operator dashboard → operator health, reconciliation metrics
  • Transporter dashboard → backup and data movement metrics
  • Beacon/Telemetry dashboard → data flow to observability stack

Work with your platform team to validate and tune these dashboards.

Key Kubernetes checks

Use kubectl to monitor component health:

  • kubectl get pods -A → look for Pending, CrashLoopBackOff, Error states
  • kubectl get nodes → validate node Ready status and resource pressure
  • kubectl get pvc -A → monitor PVC capacity and binding status
  • kubectl get svc -A → validate LoadBalancer and Service endpoints

Key log checks

Use Loki and/or kubectl logs:

  • Look for recent errors in API, UI, operator, Transporter, and Beacon logs.
  • Validate regular reconciliation activity in operator logs.
  • Validate successful backup and Transporter job completion.
  • Look for errors in Fluent Bit or Loki components (log pipeline health).
  • Ongoing → dashboard and alert monitoring
  • Daily checks → Pod health, node health, key component logs
  • Pre/post change → baseline and compare component state
  • Periodic validation → test full monitoring pipeline (logs, metrics, dashboards)

Summary checklist

  • Use Grafana dashboards to monitor overall Hybrid Manager platform state.
  • Use Prometheus for detailed metric validation.
  • Use Loki and kubectl logs for log-based troubleshooting.
  • Monitor Kubernetes node, Pod, PVC, and Service health.
  • Establish baseline state and monitor deviations.
  • Perform regular cluster state validation to detect emerging issues.

Could this page be better? Report a problem or suggest an addition!