Configuring a Hybrid Manager deployment across multiple data centers Innovation Release
To be able to deploy Postgres databases into multiple data centers, you first must deploy HM installations into two or three different geographical locations or regions, supporting the Hub & Spoke or Primary & Secondary architecture. You then link these installations to each other, so you can manage them from a single HM console. This guides help you:
Create all HM installation configuration files, and deploy them.
Connect two or more HM installations (HM Kubernetes clusters) on the same provider/on-prem family. These can be on different geographical regions. For example, you can have a primary and one secondary, or a primary and two secondaries.
Align object storage (identical
edb-object-storagesecret), so backups/artifacts are usable in all data centers.Wire the HM-internal agent (Beacon, or
upm-beacon-agent) so the secondary can register to the primary as managed locations and provision there (9445/TCP).Prepare a Postgres topology with a primary Postgres cluster in one data center and replica Postgres cluster(s) in the other(s); perform manual failover by promoting replicas.
Before you start
Before starting the multi-DC procedure, ensure you have made yourself familiar with:
Prerequisites
Architecture prerequisites
This multi-DC setup follows a Hub and Spoke model, where a single Primary manages several lean Secondaries.
Hub cluster: One Kubernetes cluster to host the Primary HM. This cluster serves as the central management plane and UI.
Spoke clusters: One or two Kubernetes clusters to host Secondary HMs. These act as execution points for your database workloads.
Network connectivity:
8444/TCP open between clusters (SPIRE bundle endpoint).
9445/TCP from secondaries → primary (Beacon gRPC).
Same provider/on-prem family (no cross-cloud).
Shared object storage: The standard HM installation requires each individual Kubernetes cluster (HM installation) to have dedicated object storage. In the case of a multi-DC deployment, this object storage is shared between all clusters.
Collect the required information
Prepare two (or three) copies of the HM installation configuration file (
values.yaml). Name themprimary.yaml,secondary.yamlif you are deploying in two locations, orprimary.yaml,secondaryA.yamlandsecondaryB.yamlif you are deploying in three locations.Domain names:
Each HM installation must be configured with a dedicated domain name, set in the HM installation configuration file as
portal_domain_name. This parameter is used by the primary and secondaries.Create these domain names for the each of the HM installations (two or three installations, depending on your configuration), and record this information.
Object storage across locations
HM uses an object store for backups, artifacts, WAL, and internal bundles. In multi-DC, all HM installations must use the same object store configuration.
Key requirement
All HM installations must have an identical Kubernetes secret named edb-object-storage in the default namespace. Store this secret in the primary and secondary if you are running on two locations. Store this secret in the primary, and secondary locations, if you have three data centers.
Parameter uniqueness
Because the two or three clusters involved in this multi-data-center deployment are using the same object store, we must ensure that specific parameters are different between all clusters.
| Parameter | Requirement | Why it matters |
|---|---|---|
location_id | Must be unique per location. | Human-readable location identifier. Identifies the specific location in the HM console and API. |
internal_backup_folder | Must be unique per location AND match the format: ^[0-9a-z]{12}$. | Separates database backups so that a restore doesn't pull the wrong data. |
metrics_storage_prefix | Must be unique per location. | Ensures observability data from each site is stored in its own directory. |
logs_storage_prefix | Must be unique per location. | Prevents logs from different locations from overwriting each other in shared storage. |
Configure the multi-DC topology
In this step, you define the Hub-and-Spoke relationship. The Hub (primary) must know about all its Spokes (secondaries), and each Spoke must know how to reach the Hub.
Add the following stanza to each of your configuration files (primary.yaml, and secondary.yaml):
clusterGroups: # Set to 'primary' for your Hub, 'secondary' for your Spokes role: (secondary|primary|standalone) primary: domainName: <primary portal domain> secondaries: - domainName: <secondary portal domain> # Add an additional entry here for additional locations
Fill in the missing domainName parameters using the portal_domain_name parameter you previously set in each HM installation configuration file.
In primary.yaml: Set role: primary. This cluster will act as the Hub.
In secondary.yaml: Set role: secondary. This cluster will act as a Spoke and use the primary.domainName to find its manager.
Reduced set of components (Spoke clusters only)
The HM consists of a number of different components, some of which are not necessary on the secondary locations. While the full set can be installed successfully on secondary locations, we recommend to reduce that list by setting the parameter scenarios to be just core, and disabling the HM console (UI).
Add the following parameters in the values file for secondary locations:
scenarios: 'core' disabledComponents: - upm-ui
Validation checklist:
The
edb-object-storagesecret must be identical across all locations (compare .data only).All locations can list/write the bucket (quick Pod/Job test).
location_id,internal_backup_folder,metrics_storage_prefix, andlogs_storage_prefixmust be unique per location.scenarioshas been set to becorefor secondary locations.disabledComponentshas theupm-uicomponent listed for secondary locations.
Hybrid Manager installation
Using the HM installation configuration files primary.yaml and secondary.yaml (or primary.yaml, secondaryA.yaml and secondaryB.yaml for a three-location setup), install the Hybrid Manager through helm.
Validate wiring
Once the Helm installations are complete, verify the Hub and Spoke link.
On the primary, list the managed secondary locations:
kubectl get location
You should see all locations as
managed-<SECONDARY_LOCATION_NAME>with recentLASTHEARTBEAT.Validate that SPIRE federation is present on all locations:
kubectl -n spire-system exec svc/spire-server -c spire-server -- \ /opt/spire/bin/spire-server federation list
You should see a
federation listshowing the relationships (the peer trust domain) withbundle endpoint profile: https_spiffeand the peer’s:8444URL.On the primary, you should see one or two entries in the federation list (one for each secondary).
On each secondary instances, you will only see one entry (the relationship back to the primary).
Create a Postgres database cluster across different data centers
HM can now provision into the secondary locations, but you must still choose and create the actual database topology. In the HM console, create a database cluster and make sure to select different locations for each database node.
For single-node and high availability database clusters, during cluster creation, use the Replica Clusters tab to add replica database clusters in other locations.
For advanced high availability and distributed high availability database clusters, during cluster creation, use the Data Groups tab > Node Settings > Deployment Location option to distribute the cluster data groups across locations.
Note
Witness nodes in PGD require a minimum of 10GB of disk space.
Ensure backups are writing to the shared object store from both data centers.
Operational notes
- DB TLS is separate from SPIRE/Beacon (platform identity). Configure PG TLS per your policy.
- Verify StorageClasses in each DC meet PG IOPS/latency.
- Open replication ports between sites.
Validation (end-to-end)
On the primary location, perform the following checks to validate the multi-DC setup:
Validate primary/secondary cluster relationships:
kubectl -n spire-system exec svc/spire-server -c spire-server -- \ /opt/spire/bin/spire-server federation list
Validate that the secondary locations are registered:
kubectl get location
Validate provisioning to secondary works:
- From the primary, deploy a small test workload to a secondary location.
- Telemetry (optional) Thanos stores show federated peer; Loki queries return logs tagged from secondary.
- Object storage Both clusters can read/write the bucket; secrets are identical.
- From the primary, deploy a small test workload to a secondary location.
Manual failover
Manual failover procedure for databases from the primary location to a secondary location
Suspend writes to the primary location (maintenance mode/LB cutover).
Promote a replicas in a secondary location to primary (using the HM console or your scripts).
Redirect clients (DNS/LB) to the secondary location.
Observe: confirm that writes succeed; and the replication role is updated.
When original the primary location returns: re-seed it as a replica of the new primary; optionally plan a later cutback.
Note
While you promote the Postgres database in the secondary DC, the HM console itself still runs on the primary. If the primary DC (where HM lives) is completely gone, follow the HM disaster recovery guide to restore the management console elsewhere.
Operator tips
- Keep DNS TTL low enough for cutovers.
- Track downtime to measure RTO.
- Validate backups post-promotion.
Troubleshooting
Problem: No federation relationships
- Re-generate and cross-apply
ClusterFederatedTrustDomainCRs. - Confirm 8444/TCP reachability.
- Re-generate and cross-apply
Problem: secondary not listed in
kubectl get location- Recheck Beacon values on both sides; restart Beacon server/agent.
- Confirm 9445/TCP reachability to primary portal; trust domains correct.
Problem: Object store access fails on secondary
- Re-sync
edb-object-storage. - For EKS/IRSA: ensure secondary OIDC is in the role’s trust policy.
- Re-sync
Problem: Telemetry federation missing
- Reinstall with the correct -l primary|secondary flags and unique prefixes.
- Check Thanos
/api/v1/storesand Loki read API.
Problem: Replica lag / connectivity
- Verify network ACLs/SGs, TLS certs, and storage performance.
Appendix B — Quick daily checks
kubectl get locationon primary shows secondary Ready.- Thanos/Loki federation healthy (if enabled).
- Object store writes succeed from both DCs.
- Replication lag within SLOs.