Troubleshoot Hybrid Manager components

This how-to guide explains how to troubleshoot Hybrid Manager (HM) platform components running on Kubernetes.

This covers platform services such as:

  • Hybrid Manager UI and API services
  • Postgres operator
  • PGD operator
  • Transporter services
  • Storage location operator
  • Agent and telemetry components
  • Other UPM platform services

For guidance on managing Postgres clusters, see Troubleshooting Kubernetes

Understand Hybrid Manager component architecture

Hybrid Manager services are deployed across multiple namespaces:

  • upm-ui → UI service
  • upm-api-* → API services
  • postgresql-operator-system → Postgres operator
  • pgd-operator-system → PGD operator (if used)
  • transporter-* → Transporter services
  • storage-location-operator → Storage location services
  • upm-beacon, upm-beaco-ff-base, other upm-* namespaces → telemetry and support components

Each service typically runs as a Kubernetes Deployment with associated Pods.

Common symptoms and investigation steps

Hybrid Manager UI not responding

Symptoms:

  • UI unavailable or slow
  • 5xx errors in browser

Investigation:

  • Check UI Pod status: kubectl get pods -n upm-ui
  • Check UI Pod logs: kubectl logs <pod-name> -n upm-ui
  • Check associated Ingress or LoadBalancer health
  • Validate API services are reachable — UI depends on API

Hybrid Manager API issues

Symptoms:

  • API returns errors or is unreachable
  • UI shows partial data or errors

Investigation:

  • Check API service Pods: kubectl get pods -n upm-api-*
  • Check API Pod logs for errors: kubectl logs <pod-name> -n <upm-api-*>
  • Validate Service Endpoints are correct
  • Check Postgres operator health — API depends on operator state

Operator errors

Symptoms:

  • Postgres clusters not provisioning
  • Cluster stuck in Pending or Error state
  • Operator reconciliation errors

Investigation:

  • Check Postgres operator Pods: kubectl get pods -n postgresql-operator-system
  • Check operator logs: kubectl logs <pod-name> -n postgresql-operator-system
  • Check Cluster CR status and Events
  • Validate associated PVCs and Services

Transporter or backup issues

Symptoms:

  • Backups failing or not visible
  • Transporter jobs not running

Investigation:

  • Check Transporter Pods: kubectl get pods -n transporter-*
  • Check relevant Pod logs for errors
  • Validate IRSA or Workload Identity configuration
  • Check object storage access credentials

Agent and telemetry issues

Symptoms:

  • Monitoring not available
  • Missing metrics or events

Investigation:

  • Check upm-beacon and upm-beaco-ff-base Pods
  • Validate Fluent Bit → Loki pipeline (see Use EDB observability stack)
  • Check Prometheus targets and scraping status

Log locations and patterns

Component logs are typically available in:

  • Kubernetes logs: kubectl logs
  • Grafana Explore → Loki: logs aggregated across components

Common log patterns:

  • Operator reconciliation errors → in operator logs
  • API call errors → in API service logs
  • UI errors → in UI service logs
  • Backup errors → in Transporter or Agent logs
  • Storage access errors → in Storage location operator logs

When to restart components

If a component is consistently failing or stuck:

When to escalate

Escalate to support when:

  • Persistent errors remain after component restart
  • Operator unable to reconcile Cluster CR
  • Data integrity concerns arise
  • Multiple Hybrid Manager components are failing simultaneously

Capture relevant logs and Events before escalation.

Summary checklist

  • Understand Hybrid Manager component architecture and namespaces.
  • Investigate Pods and logs for relevant component.
  • Validate networking, storage, and dependent services.
  • Use observability stack (Grafana, Loki) to correlate logs and metrics.
  • Restart components when appropriate.
  • Escalate complex issues with supporting data.

Could this page be better? Report a problem or suggest an addition!