Troubleshoot common scenarios

Suggest edits

This how-to guide provides common troubleshooting scenarios you may encounter when operating Hybrid Manager and EDB-managed Postgres clusters on Kubernetes.

For each scenario, we show:

Problem / Symptoms
Why tracing Kubernetes resources helps
Key Kubernetes checks
Related How-to pages

Use this guide with:

Scenario: Cannot connect to the Postgres database externally

Why trace: Connection depends on Kubernetes Services, LoadBalancer, and external network configuration.

Kubernetes checks:

kubectl get service -n <project-namespace> → Look for TYPE=LoadBalancer, check EXTERNAL-IP.
kubectl get ingress -n <project-namespace> (if used).
kubectl describe service <db-external-svc-name> -n <project-namespace> → Check Endpoints and annotations.
kubectl get pods -n <project-namespace> → Ensure DB Pods are Running.
Cloud Console → Check Load Balancer health, firewall/SecurityGroup rules.

Scenario: Application cannot connect to the database from within the cluster

Why trace: Internal Service and DNS resolution must be correct.

Kubernetes checks:

kubectl get service -n <project-namespace> → Check ClusterIP services.
kubectl get endpoints <db-service-name> -n <project-namespace> → Verify correct Pod IPs.
kubectl describe pod <db-pod-name> -n <project-namespace> → Pod IP, status, events.
From another Pod: nslookup <db-service-name>.<project-namespace>.svc.cluster.local

Scenario: HCP Postgres Pod is not starting

Why trace: Pod Events and PVC/Storage issues are common causes.

Kubernetes checks:

kubectl describe pod <pod-name> -n <project-namespace> → Look at Events section.
kubectl logs <pod-name> -n <project-namespace>
kubectl logs --previous <pod-name> -n <project-namespace> (if restarting).
kubectl get pvc -n <project-namespace> → PVCs Bound?
kubectl describe pvc <pvc-name> -n <project-namespace>
kubectl describe node <node-name> → Resource availability.
Check image pull secret if ImagePullBackOff.

Scenario: HCP Operator Pod is not running or is erroring

Why trace: Operator health affects cluster reconciliation.

Kubernetes checks:

kubectl get pods -n <operator-namespace>
kubectl describe pod <operator-pod-name> -n <operator-namespace>
kubectl logs <operator-pod-name> -n <operator-namespace>

Scenario: Creating/Updating an HCP Cluster CR doesn't result in changes

Why trace: Need to verify operator reconciliation and CR Events.

Kubernetes checks:

kubectl logs <operator-pod-name> -n <operator-namespace> → Look for CR-related errors.
kubectl get <crd-plural-name> <hcp-cluster-name> -n <project-namespace> -o yaml → Check Status section.
kubectl describe <crd-plural-name> <hcp-cluster-name> -n <project-namespace> → Check Events.

Scenario: Backups are failing

Why trace: Backup agents depend on Secrets, cloud IAM configuration, and storage access.

Kubernetes checks:

kubectl get pods,jobs,cronjobs -n <namespace-where-backups-run>
kubectl logs <backup-pod/job-pod-name> -n <namespace-where-backups-run>
kubectl get secret edb-object-storage -n <project-namespace> -o yaml → Verify keys.
kubectl describe sa <db-pod-service-account> -n <project-namespace> → Check IRSA/Workload Identity annotations.
Cloud Console → Check bucket permissions and IAM roles.

Scenario: Storage Issues (PVC Pending, Slow Performance)

Why trace: PVC provisioning and cloud storage health need to be validated.

Kubernetes checks:

kubectl describe pvc <pvc-name> -n <project-namespace> → Events for provisioning errors.
kubectl get storageclass
kubectl describe storageclass <sc-name>
kubectl get pods -n kube-system | grep csi → Check CSI driver pods.
kubectl logs <csi-driver-pod-name> -n kube-system
Cloud Console → Check volume metrics (IOPS, latency, throughput).

Scenario: My HCP cluster or Postgres instance seems slow or unresponsive

Why trace: Resource limits and underlying node/storage pressure are common causes.

Kubernetes checks:

kubectl top pods -n <project-namespace>
kubectl describe pod <db-pod-name> -n <project-namespace> → Check Requests and Limits.
kubectl top nodes
Cloud Console → Check storage volume metrics.

Scenario: I need to increase resources (CPU, Memory, Storage) for my HCP cluster

Why trace: Must validate Kubernetes capacity before scaling Postgres settings.

Kubernetes checks:

kubectl describe node <node-name> → Allocatable resources.
kubectl get nodes → Node availability.
kubectl describe storageclass <sc-name> → allowVolumeExpansion: true?
Cloud Console → Instance limits and quotas.

Scenario: The HCP UI shows my cluster is "Pending", "Provisioning", "Stalled", or "Error"

Why trace: UI reflects operator state → check underlying Kubernetes resources.

Kubernetes checks:

kubectl get pods -n <project-namespace> → Pod readiness.
kubectl describe pod <pod-name> -n <project-namespace> → Events for scheduling/mounting issues.
kubectl get pvc -n <project-namespace>
kubectl describe pvc <pvc-name> -n <project-namespace>
kubectl describe cluster <hcp-cluster-name> -n <project-namespace> → Status and Events.
kubectl logs <operator-pod-name> -n <operator-namespace>

Scenario: Debugging a specific Postgres instance (connection refused, error message)

Why trace: Postgres server logs are inside Kubernetes Pod.

Kubernetes checks:

kubectl get pods -n <project-namespace> → Identify DB Pod.
kubectl logs <db-pod-name> -n <project-namespace>
kubectl logs -f <db-pod-name> -n <project-namespace> → Live logs.
kubectl logs --previous <db-pod-name> -n <project-namespace>

Scenario: Building custom monitoring dashboards

Why trace: Identify the right metrics endpoints and labels.

Kubernetes checks:

kubectl describe pod <db-pod-name> -n <project-namespace> → Prometheus scrape annotations.
kubectl get service -n <project-namespace> → Metrics Service.
kubectl get servicemonitor,podmonitor -n <project-namespace> → Prometheus Operator monitoring resources.

Scenario: How to troubleshoot HCP-specific issues

Why trace: Hybrid Manager integrates with Kubernetes and relies on operator health, CRD reconciliation, and observability tooling.

Kubernetes checks:

Check the health of HCP components via:
kubectl get pods -n postgresql-operator-system
kubectl describe pod <operator-pod> -n postgresql-operator-system
kubectl logs <operator-pod> -n postgresql-operator-system
Confirm your cluster Custom Resource (CR) status:
kubectl describe cluster <hcp-cluster-name> -n <namespace>
kubectl get cluster <hcp-cluster-name> -n <namespace> -o yaml

Related How-to pages

Could this page be better? Report a problem or suggest an addition!