Kubernetes monitoring best practices

Hybrid Manager runs on Kubernetes and depends on Kubernetes infrastructure to deliver availability, scalability, and performance. Effective monitoring of Kubernetes clusters is essential for maintaining the health of Hybrid Manager components and managed Postgres services.

This page provides monitoring best practices for Kubernetes clusters used with Hybrid Manager and EDB Platform services. It complements the general Minimum and recommended Kubernetes specs, Kubernetes storage best practices, and Kubernetes networking best practices.

For background on how Kubernetes is used in Hybrid Manager, see Kubernetes in Hybrid Manager.

Use a complete observability stack

A robust observability stack for Kubernetes should include:

  • Metrics → Prometheus (or cloud-native equivalent)
  • Dashboards → Grafana
  • Logs → Loki, Fluent Bit, or EFK stack (Elasticsearch, Fluentd, Kibana)
  • Tracing (optional) → OpenTelemetry, Jaeger

Hybrid Manager components expose metrics and logs that can be integrated into this stack.

Monitor Kubernetes cluster health

Key metrics to monitor:

  • Node status (Ready/NotReady), node resource usage (CPU, memory, disk)
  • Pod status (Running/Pending/CrashLoopBackOff), Pod restarts
  • PersistentVolumeClaim (PVC) capacity and usage
  • Control plane API server health and latency
  • etcd health (if self-managed Kubernetes)
  • Kubernetes component logs (API server, scheduler, controller-manager)

Monitor Hybrid Manager platform components

Key components to monitor:

  • Hybrid Manager UI and API services (availability, latency, error rates)
  • Operators (Postgres operator, PGD operator)
  • Backup agents and CronJobs
  • Monitoring stack components themselves (Prometheus, Grafana, Loki)

Dashboards should provide visibility into:

  • Component availability and health
  • Component resource consumption trends
  • Component-specific error rates or alerts

Monitor managed Postgres clusters

Database-specific monitoring should cover:

  • Postgres instance availability
  • Replication status and lag
  • Storage usage and IOPS
  • Connection counts and slow queries
  • WAL archive status (if using external backups)
  • Resource usage (CPU, memory) of database Pods

Hybrid Manager-managed Postgres clusters expose standard Postgres metrics via built-in exporters or sidecars.

Integrate with external monitoring

Many organizations integrate Kubernetes observability with external tools:

  • Cloud-native monitoring services (CloudWatch, Stackdriver, Azure Monitor)
  • Centralized enterprise observability platforms (Datadog, Splunk, New Relic)
  • SIEM systems (for log ingestion and security monitoring)

Hybrid Manager environments should be integrated into broader observability pipelines where required.

Set meaningful alerts

Recommended alerting patterns:

  • Node NotReady or significant resource pressure
  • Pod CrashLoopBackOff for Hybrid Manager or Postgres components
  • PVC approaching full capacity
  • Postgres replication lag exceeds threshold
  • High error rates on API or UI components
  • Missing expected metrics (indicates possible monitoring pipeline issue)

Avoid overly noisy or generic alerts — tune thresholds to reflect real operational concerns.

Validate observability setup regularly

Operational best practices:

  • Periodically test and validate dashboards, alerts, and data completeness
  • Run synthetic checks for critical services (UI, API endpoints, Postgres connectivity)
  • Validate that log ingestion is complete and accurate
  • Test alert routing to appropriate on-call responders

Summary checklist

  • Deploy a complete observability stack (metrics, dashboards, logs)
  • Monitor Kubernetes infrastructure health continuously
  • Monitor Hybrid Manager components and operators
  • Monitor managed Postgres clusters comprehensively
  • Integrate with external observability and security tools as needed
  • Define and tune actionable alerts
  • Periodically validate observability pipelines and coverage

Could this page be better? Report a problem or suggest an addition!