Troubleshoot Hybrid Manager components

Suggest edits

This how-to guide explains how to troubleshoot Hybrid Manager (HM) platform components running on Kubernetes.

This covers platform services such as:

Hybrid Manager UI and API services
Postgres operator
PGD operator
Transporter services
Storage location operator
Agent and telemetry components
Other UPM platform services

For guidance on managing Postgres clusters, see Troubleshooting Kubernetes

Understand Hybrid Manager component architecture

Hybrid Manager services are deployed across multiple namespaces:

upm-ui → UI service
upm-api-* → API services
postgresql-operator-system → Postgres operator
pgd-operator-system → PGD operator (if used)
transporter-* → Transporter services
storage-location-operator → Storage location services
upm-beacon, upm-beaco-ff-base, other upm-* namespaces → telemetry and support components

Each service typically runs as a Kubernetes Deployment with associated Pods.

Common symptoms and investigation steps

Hybrid Manager UI not responding

Symptoms:

UI unavailable or slow
5xx errors in browser

Investigation:

Check UI Pod status: kubectl get pods -n upm-ui
Check UI Pod logs: kubectl logs <pod-name> -n upm-ui
Check associated Ingress or LoadBalancer health
Validate API services are reachable — UI depends on API

Hybrid Manager API issues

Symptoms:

API returns errors or is unreachable
UI shows partial data or errors

Investigation:

Check API service Pods: kubectl get pods -n upm-api-*
Check API Pod logs for errors: kubectl logs <pod-name> -n <upm-api-*>
Validate Service Endpoints are correct
Check Postgres operator health — API depends on operator state

Operator errors

Symptoms:

Postgres clusters not provisioning
Cluster stuck in Pending or Error state
Operator reconciliation errors

Investigation:

Check Postgres operator Pods: kubectl get pods -n postgresql-operator-system
Check operator logs: kubectl logs <pod-name> -n postgresql-operator-system
Check Cluster CR status and Events
Validate associated PVCs and Services

Transporter or backup issues

Symptoms:

Backups failing or not visible
Transporter jobs not running

Investigation:

Check Transporter Pods: kubectl get pods -n transporter-*
Check relevant Pod logs for errors
Validate IRSA or Workload Identity configuration
Check object storage access credentials

Agent and telemetry issues

Symptoms:

Monitoring not available
Missing metrics or events

Investigation:

Check upm-beacon and upm-beaco-ff-base Pods
Validate Fluent Bit → Loki pipeline (see Use EDB observability stack)
Check Prometheus targets and scraping status

Log locations and patterns

Component logs are typically available in:

Kubernetes logs: kubectl logs
Grafana Explore → Loki: logs aggregated across components

Common log patterns:

Operator reconciliation errors → in operator logs
API call errors → in API service logs
UI errors → in UI service logs
Backup errors → in Transporter or Agent logs
Storage access errors → in Storage location operator logs

When to restart components

If a component is consistently failing or stuck:

Follow Restart Hybrid Manager components procedure.
Monitor after restart to validate behavior.

When to escalate

Escalate to support when:

Persistent errors remain after component restart
Operator unable to reconcile Cluster CR
Data integrity concerns arise
Multiple Hybrid Manager components are failing simultaneously

Capture relevant logs and Events before escalation.

Summary checklist

Understand Hybrid Manager component architecture and namespaces.
Investigate Pods and logs for relevant component.
Validate networking, storage, and dependent services.
Use observability stack (Grafana, Loki) to correlate logs and metrics.
Restart components when appropriate.
Escalate complex issues with supporting data.

Could this page be better? Report a problem or suggest an addition!

Troubleshoot Hybrid Manager components

Understand Hybrid Manager component architecture

Common symptoms and investigation steps

Hybrid Manager UI not responding

Hybrid Manager API issues

Operator errors

Transporter or backup issues

Agent and telemetry issues

Log locations and patterns

When to restart components

When to escalate

Summary checklist

Related topics