Troubleshoot Hybrid Manager components
This how-to guide explains how to troubleshoot Hybrid Manager (HM) platform components running on Kubernetes.
This covers platform services such as:
- Hybrid Manager UI and API services
- Postgres operator
- PGD operator
- Transporter services
- Storage location operator
- Agent and telemetry components
- Other UPM platform services
For guidance on managing Postgres clusters, see Troubleshooting Kubernetes
Understand Hybrid Manager component architecture
Hybrid Manager services are deployed across multiple namespaces:
upm-ui→ UI serviceupm-api-*→ API servicespostgresql-operator-system→ Postgres operatorpgd-operator-system→ PGD operator (if used)transporter-*→ Transporter servicesstorage-location-operator→ Storage location servicesupm-beacon,upm-beaco-ff-base, otherupm-*namespaces → telemetry and support components
Each service typically runs as a Kubernetes Deployment with associated Pods.
Common symptoms and investigation steps
Hybrid Manager UI not responding
Symptoms:
- UI unavailable or slow
- 5xx errors in browser
Investigation:
- Check UI Pod status:
kubectl get pods -n upm-ui - Check UI Pod logs:
kubectl logs <pod-name> -n upm-ui - Check associated Ingress or LoadBalancer health
- Validate API services are reachable — UI depends on API
Hybrid Manager API issues
Symptoms:
- API returns errors or is unreachable
- UI shows partial data or errors
Investigation:
- Check API service Pods:
kubectl get pods -n upm-api-* - Check API Pod logs for errors:
kubectl logs <pod-name> -n <upm-api-*> - Validate Service Endpoints are correct
- Check Postgres operator health — API depends on operator state
Operator errors
Symptoms:
- Postgres clusters not provisioning
- Cluster stuck in Pending or Error state
- Operator reconciliation errors
Investigation:
- Check Postgres operator Pods:
kubectl get pods -n postgresql-operator-system - Check operator logs:
kubectl logs <pod-name> -n postgresql-operator-system - Check Cluster CR status and Events
- Validate associated PVCs and Services
Transporter or backup issues
Symptoms:
- Backups failing or not visible
- Transporter jobs not running
Investigation:
- Check Transporter Pods:
kubectl get pods -n transporter-* - Check relevant Pod logs for errors
- Validate IRSA or Workload Identity configuration
- Check object storage access credentials
Agent and telemetry issues
Symptoms:
- Monitoring not available
- Missing metrics or events
Investigation:
- Check
upm-beaconandupm-beaco-ff-basePods - Validate Fluent Bit → Loki pipeline (see Use EDB observability stack)
- Check Prometheus targets and scraping status
Log locations and patterns
Component logs are typically available in:
- Kubernetes logs:
kubectl logs - Grafana Explore → Loki: logs aggregated across components
Common log patterns:
- Operator reconciliation errors → in operator logs
- API call errors → in API service logs
- UI errors → in UI service logs
- Backup errors → in Transporter or Agent logs
- Storage access errors → in Storage location operator logs
When to restart components
If a component is consistently failing or stuck:
- Follow Restart Hybrid Manager components procedure.
- Monitor after restart to validate behavior.
When to escalate
Escalate to support when:
- Persistent errors remain after component restart
- Operator unable to reconcile Cluster CR
- Data integrity concerns arise
- Multiple Hybrid Manager components are failing simultaneously
Capture relevant logs and Events before escalation.
Summary checklist
- Understand Hybrid Manager component architecture and namespaces.
- Investigate Pods and logs for relevant component.
- Validate networking, storage, and dependent services.
- Use observability stack (Grafana, Loki) to correlate logs and metrics.
- Restart components when appropriate.
- Escalate complex issues with supporting data.
Related topics
Could this page be better? Report a problem or suggest an addition!