Monitor and resolve resource exhaustion in Kubernetes

Monitor and resolve resource exhaustion in Kubernetes

When running Postgres in Kubernetes using Hybrid Manager, resource exhaustion at the Pod or Node level can cause degraded performance, failed scheduling, or crashes.

This guide explains how to detect and resolve CPU, memory, and I/O saturation issues across your cluster.


What to monitor

Pod-level metrics

  • CPU and memory usage vs. limits
  • OOMKilled container restarts
  • Throttling events (CPU CFS quotas)

Node-level metrics

  • Overall node resource pressure
  • Pod evictions due to memory pressure
  • Available vs. allocatable CPU/Memory

Volume-level metrics

  • PVC IOPS and throughput
  • Pod wait times on volume access
  • Volume provisioning latency

Commands

Check usage and pressure

kubectl top pods -n <project-namespace>
kubectl top nodes
kubectl describe pod <pod-name> -n <project-namespace>
kubectl describe node <node-name>

Investigate storage

kubectl get pvc -n <project-namespace>
kubectl describe pvc <pvc-name> -n <project-namespace>

Common symptoms and resolutions

SymptomCauseResolution
Pod stuck in PendingNot enough resources on nodesIncrease node pool size or switch to a larger instance type
Frequent restarts (OOMKilled)Memory limit too lowIncrease memory requests/limits for affected Pods
CPU throttlingCPU limit too lowIncrease CPU limits; set higher requests for scheduler preference
Disk IO latencyVolume throughput limits reachedUse higher-performance StorageClass or resize disk
Cluster scaling failsNode quota exceeded or autoscaler offCheck cloud quota; ensure autoscaler is enabled and working

How to resolve

1. Identify the bottleneck

  • Use kubectl top and kubectl describe to find Pods or Nodes under pressure
  • Use cloud console metrics (e.g., GCP Monitoring, AWS CloudWatch)

2. Edit resource limits

Update the Postgres cluster custom resource YAML (via GitOps or CLI):

resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "8Gi"

Apply using kubectl apply -f <cluster-cr.yaml>

3. Adjust cluster infrastructure

  • Add more nodes to your node pool
  • Switch to larger VM instance types
  • Enable Cluster Autoscaler to allow dynamic scaling

4. Monitor continuously



Could this page be better? Report a problem or suggest an addition!