Hybrid Manager disaster recovery v1.3.5

The February 2026 Innovation Release of EDB Postgres AI is available. For more information, see the release notes.

The Disaster Recovery (DR) procedure is defined as the series of manual steps that you need to take to recover your HM installation and your HM-managed Postgres clusters.

Warning

You must constantly test and update your organization's DR procedure for it to remain valid.

Before you start

Before starting the DR procedure, ensure you have made yourself familiar with:

Prerequisites

A new HM instance deployed and running. It must be running the same version as the old instance that failed or became unavailable.
The container images used to build the clusters in the old, unavailable HM instance are available to the new one.

Required tools

Ensure the following tools are available on your workstation environment or Bastion host:

Velero CLI
jq (Command-line JSON processor)
yq (Command-line YAML processor)

1. Make backups available in the new HM instance

The first step ensures the backups of the unavailable HM instance (“old backups”) are reachable from the new HM instance by copying the backups of the damaged HM instance to the linked storage (new bucket) of the new HM instance.

Obtain the bucket names, the backup ID, and the region of the new bucket. Store these as environment variables to be used in the commands throughout this guide:
Important
These variables are session-specific. If you open a new terminal tab or your session times out, the variables will be lost, and subsequent commands will fail. To avoid re-typing these values, you could save these export commands into a small shell script (e.g., dr-env.sh). You can then "source" that file in any new terminal window to instantly reload your environment
```
export OLD_BUCKET=<old_bucket>
export OLD_BACKUP_ID=<old_environment_internal_backup_id>
export NEW_BUCKET=<new_bucket>
export NEW_REGION=<region_of_the_new_bucket>
```
How do I obtain the old bucket values?
To obtain the old bucket values:
1. Go to the console/dashboard of your CSP > buckets.
2. Find and select the bucket linked to the backups of your old HM instance.
3. Browse through to the edb-internal-backups folder. Inside that folder you will find a subfolder with the backup ID, e.g. 4be7a1c8c9f0.
EKS Example
This is an example for setting the environment variables for an HM instance deployed on EKS:
```
export OLD_BUCKET=eks-1105143903-2511-edb-postgres
export OLD_BACKUP_ID=a7462dbc7106
export NEW_BUCKET=eks-1105155418-2511-edb-postgres
export NEW_REGION=eu-west-3
```
To copy the data from the old bucket to the new bucket, you first need to locate and note the names of the source and target folders. You need to copy the following folders and their content:
- Internal EDB backups folder — The internal backups folder in the old bucket edb-internal-backups/<random-string> is different in the new HM instance, as it will have a different <random-string>.
- Postgres clusters backups folder — customer-pg-backups.
- Folder corresponding to any defined custom storage locations — If you utilize Managed Storage Locations in the HM console (e.g., for offloading Postgres queries), you must ensure the corresponding folders are copied from the old S3-compatible bucket to the new one. While the definitions are restored via Velero, the actual data inside those custom folders must be manually migrated to the new target bucket.

Copy the old backups to the new bucket using your preferred tools. Here are some examples using cloud service provider CLIs to move data between buckets:

aws s3 cp --recursive s3://${OLD_BUCKET}/edb-internal-backups/${OLD_BACKUP_ID} s3://${NEW_BUCKET}/edb-internal-backups
aws s3 cp --recursive s3://${OLD_BUCKET}/customer-pg-backups s3://${NEW_BUCKET}/customer-pg-backups

If you have configured additional Managed Storage Locations, use the same method to copy those folders.

gcloud storage cp gs://${OLD_BUCKET}/edb-internal-backups/${OLD_BACKUP_ID} gs://${NEW_BUCKET}/edb-internal-backups --recursive
gcloud storage cp gs://${OLD_BUCKET}/customer-pg-backups gs://${NEW_BUCKET}/customer-pg-backups --recursive

If you have configured additional Managed Storage Locations, use the same method to copy those folders.

Load the backups you just copied to your new HM instance by creating a new custom resource definition and applying it to the new HM instance:

kubectl apply -f - <<EOF
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  annotations:
    appliance.enterprisedb.com/s3-prefixes: edb-internal-backups/velero
  labels:
    appliance.enterprisedb.com/storage-credentials: bound
  name: recovery
  namespace: velero
spec:
  accessMode: ReadOnly
  config:
    insecureSkipTLSVerify: "false"
    region: ${NEW_REGION}
    s3ForcePathStyle: "true"
  default: false
  objectStorage:
    bucket: ${NEW_BUCKET}
    prefix: edb-internal-backups/velero
  provider: aws
EOF

kubectl apply -f - <<EOF
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  annotations:
    appliance.enterprisedb.com/s3-prefixes: edb-internal-backups/velero
  labels:
    appliance.enterprisedb.com/storage-credentials: bound
  name: recovery
  namespace: velero
spec:
  accessMode: ReadOnly
  credential:
    key: gcp
    name: gcs-credentials
  default: false
  objectStorage:
    bucket: ${NEW_BUCKET}
    prefix: edb-internal-backups/velero
  provider: gcp
EOF

Confirm that the new storage location is available:
```
velero get backup-locations
```
If the status is not Available, check the Velero pod logs for permission errors on the S3 bucket.

Confirm that the backups are available as well:

velero get backups --selector velero.io/storage-location=recovery

Choose the backup you want to restore from. You can have multiple backups available, so choose the one that best suits your needs, e.g. the most recent backup before the disaster happened. Note the Velero backup name, as well as the date and time (UTC), as both are required for a restore, for example:
```
NAME                                      STATUS      ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR  
velero-backup-kube-state-**20241216154403**   Completed   0        0          2024-12-16 16:44:03 \+0100 CET   5d        recovery           \<none\>
```
Note
The timestamp value is referred to as the recovery date in the instructions that follow.

(Optional) If you were using HM to manage AI workloads, e.g. with the GenAI Builder, also copy the object store files and CORS configuration from the old bucket to the new one:

export OLD_BUCKET_DATALAKE=<old bucket>
export NEW_BUCKET_DATALAKE=<new bucket>

# Copy data lake objects from old bucket to new bucket
aws s3 cp --recursive s3://${OLD_BUCKET_DATALAKE}/ s3://${NEW_BUCKET_DATALAKE}/
# Copy CORS configuration from old bucket to new bucket
aws s3api get-bucket-cors --bucket ${OLD_BUCKET_DATALAKE} --output json > cors-config.json
aws s3api put-bucket-cors --bucket ${NEW_BUCKET_DATALAKE} --cors-configuration file://cors-config.json

# Copy data lake objects from old bucket to new bucket
gcloud storage cp "gs://${OLD_BUCKET_DATALAKE}/**" gs://${NEW_BUCKET_DATALAKE}/ --recursive
# Copy CORS configuration from old bucket to new bucket
gcloud storage buckets describe gs://rhos-uat-griptape-datalake --format="json" | jq .cors_config > cors-config.json
gcloud storage buckets update gs://${NEW_BUCKET_DATALAKE} --cors-file=cors-config.json

2. Recovery steps

Restore HM-internal databases

After the old backups are available in the new bucket, you can restore the HM-internal databases. These are back-end services used by HM and are required to fully restore the HM instance. Depending on the HM version you are using and on the installation scenario you have deployed, the list of databases may vary.

To simplify these process, run following script with your kubeconfig pointing to your new HM installation:
patch-clusters.sh
Script details
This patch script takes care of:
- Saving the HM-internal database cluster manifests to YAML files, while generating two directories:
  - One directory old-cluster-configs with the current state of the database clusters in the new HM installation (default configuration after installation).
  - Another one called new-cluster-configs with the same files, where the script will perform the patches required so the HM-internal databases start using the data from the backups.

Suspend the reconciliation of all HM-internal database clusters, so that you can safely remove the old Custom Resource Definitions (CRDs) of the database clusters without having the operator recreating them by default:

HCP_CR=$(kubectl get hybridcontrolplanes.edbpgai.edb.com -A -o json | jq -rc '.items[0] | .metadata.name')
for CLUSTER in $(kubectl get clusters.postgresql.k8s.enterprisedb.io -A -o json | jq -rc '.items[].metadata | select((.name | test("^p-") | not) and (.name != "stats-collector-db")) | {namespace: .namespace}' | uniq)
do
 NAMESPACE=$(echo "${CLUSTER}" | jq -rc '.namespace')
 INDEX=$(kubectl get hybridcontrolplane ${HCP_CR} -o json | jq '.status.components | to_entries[] | select(.value.name=='\"${NAMESPACE}\"') | .key')
 kubectl patch hybridcontrolplane ${HCP_CR} --subresource=status --type=json -p "[{\"op\": \"replace\", \"path\": \"/status/components/$INDEX/suspended\", \"value\": true}]"
done

Verify that the components have been suspended correctly:

HCP_CR=$(kubectl get hybridcontrolplanes.edbpgai.edb.com -A -o json | jq -rc '.items[0] | .metadata.name')
kubectl hcp status -n edbpgai-bootstrap "${HCP_CR}"

Delete the HM-internal database clusters that were created during installation of the new HM instance to make room for the HM-internal database clusters that will be recovered from the backup:
```
for CONFIG in $(find new-cluster-configs -type f)
do
    kubectl delete -f $CONFIG
done
```

Clean the backup area that was created during the installation of the new HM instance to avoid confusion with the old backups that you want to restore:

for CONFIG in $(find new-cluster-configs -type f)
do
    NAME=$(yq '.metadata.name' $CONFIG)
    NAMESPACE=$(yq '.metadata.namespace' $CONFIG)
    # Try to get PREFIX from cluster config first, fallback to ObjectStore if it fails
    PREFIX=$(yq '.spec.backup.barmanObjectStore.destinationPath | downcase' $CONFIG 2>/dev/null)
    if [ -z "$PREFIX" ] || [ "$PREFIX" = "null" ]; then
      # destinationPath doesn't exist in the cluster config (after CNPG-I barman plugin migration)
      # Get it from the ObjectStore resource instead
      PREFIX=$(kubectl -n $NAMESPACE get objectstores.barmancloud.cnpg.io $NAME -o yaml | yq '.spec.configuration.destinationPath | downcase')
    fi
    aws s3 rm --recursive ${PREFIX}/${NAME}
done

for CONFIG in $(find new-cluster-configs -type f)
do
    NAME=$(yq '.metadata.name' $CONFIG)
    NAMESPACE=$(yq '.metadata.namespace' $CONFIG)
    # Try to get PREFIX from cluster config first, fallback to ObjectStore if it fails
    PREFIX=$(yq '.spec.backup.barmanObjectStore.destinationPath | downcase' $CONFIG 2>/dev/null)
    if [ -z "$PREFIX" ] || [ "$PREFIX" = "null" ]; then
      # destinationPath doesn't exist in the cluster config (after CNPG-I barman plugin migration)
      # Get it from the ObjectStore resource instead
      PREFIX=$(kubectl -n $NAMESPACE get objectstores.barmancloud.cnpg.io $NAME -o yaml | yq '.spec.configuration.destinationPath | downcase')
    fi
    gsutil -m rm -r ${PREFIX}/${NAME}
done

Apply the YAML file for all the HM-internal database clusters to be re-created with the backup data:
```
for CONFIG in $(find new-cluster-configs -type f)
do
    kubectl apply -f $CONFIG
done
```
You can monitor the restore progress using kubectl get clusters -A.

Restart HM services

After all HM-internal database clusters are successfully restored and reporting a healthy state, perform this one-time restart of the management server to refresh the HM console:

kubectl delete pods $(kubectl get pods -n upm-beaco-ff-base | grep '^accm-server' | awk '{print $1}') -n upm-beaco-ff-base

Wait for the new pod to reach the Running state. At this point, the HM console is available, though it won't yet show your HM-managed Postgres clusters.

Configure the Velero plugin

The Velero plugin handles the transformation of Kubernetes resources during the restore. Most importantly, it ensures Postgres clusters are restored in a state that allows you to manually trigger their data recovery.

List the available backups and note the Name and Timestamp of your preferred recovery point:

velero get backups -o json --selector velero.io/storage-location=recovery \
| jq -rc '(["Name", "Timestamp"]), (.items // [.] | .[] | [.metadata.name, .metadata.creationTimestamp]) | @tsv' \
| column -t -s "$(printf '\t')"

Export the environment variables:

export BACKUP_TIMESTAMP=<recovery date in YYYY–MM-DDTHH:MM:SSZ format>
export BACKUP_NAME=<selected name>
# These environment variables should already be available in your terminal
export OLD_BUCKET=<old bucket name>
export NEW_BUCKET=<new bucket name>

Note

The BACKUP_TIMESTAMP must be the exact ISO timestamp (e.g., 2024-12-16T15:44:03Z) found in the previous step.

Create and apply a ConfigMap to configure the Velero plugin:

kubectl apply -f - <<EOF
apiVersion: v1  
kind: ConfigMap  
metadata:  
  name: velero-plugin-for-edbpgai  
  namespace: velero  
  labels:  
    velero.io/plugin-config: ""  
    enterprisedb.io/edbpgai-plugin: RestoreItemAction  
data:  
  # configure disaster recovery mode, so restored items are transformed as needed  
  drMode: "true"  
  # configure a date corresponding to the velero backup date. Note the format!  
  drDate: "${BACKUP_TIMESTAMP}"  
  # old and new buckets for internal custom storage locations  
  oldBucket: ${OLD_BUCKET}  
  newBucket: ${NEW_BUCKET}
EOF

Restore resources

Restore Managed Storage Locations by applying the following Velero restore. This includes the default managed-devspatcher location as well as any additional custom-defined locations.

kubectl apply -f - <<EOF
apiVersion: velero.io/v1  
kind: Restore  
metadata:  
  name: restore-1-storagelocations  
  namespace: velero  
spec:  
  backupName: "${BACKUP_NAME}" 
  includedResources:  
   - storagelocations.biganimal.enterprisedb.com  
  includeClusterResources: true  
  labelSelector:  
    matchLabels:  
      biganimal.enterprisedb.io/reserved-by-biganimal: "false"
EOF

Configure and apply the following Velero restore resource manifest to restore the cluster wrappers:

kubectl apply -f - <<EOF
apiVersion: velero.io/v1  
kind: Restore  
metadata:  
  name: restore-2-clusterwrappers  
  namespace: velero  
spec:  
  backupName: "${BACKUP_NAME}" 
  includedResources:  
   - clusterwrappers.beacon.enterprisedb.com  
  restoreStatus:  
    includedResources:  
     - clusterwrappers.beacon.enterprisedb.com
EOF

Monitor the restore progress. You must wait until clusterwrappers is restored first, because the following custom resources (CR) depend on it. If the corresponding clusterwrapper isn't found, HM could delete the other CRs.
```
velero get restore restore-2-clusterwrappers
```

After the cluster wrappers are restored, configure and apply the following Velero resource manifest to restore the backup wrappers:

kubectl apply -f - <<EOF
apiVersion: velero.io/v1  
kind: Restore  
metadata:  
  name: restore-3-backupwrappers  
  namespace: velero  
spec:   
  backupName: "${BACKUP_NAME}" 
  includedResources:  
   - backupwrappers.beacon.enterprisedb.com  
  restoreStatus:  
    includedResources:  
     - backupwrappers.beacon.enterprisedb.com
EOF

Configure and apply the following Velero resource manifest to restore Griptape, Lakekeeper and Dex secrets:

kubectl apply -f - <<EOF
apiVersion: velero.io/v1
kind: Restore
metadata:
  name: restore-4-required-secrets
  namespace: velero
spec:
  backupName: "${BACKUP_NAME}"
  includedNamespaces:
  - upm-griptape
  - upm-lakekeeper
  - upm-dex
  includedResources:
  - secrets
  includeClusterResources: false
EOF

(Optional) If you are running AI workloads, configure and apply the following Velero restore resource manifest to restore kserve resources:

kubectl apply -f - <<EOF
apiVersion: velero.io/v1
kind: Restore
metadata:
  name: restore-5-kservecrs
  namespace: velero
spec:
  backupName: "${BACKUP_NAME}"
  includedResources:
  - clusterservingruntimes.serving.kserve.io
  - inferenceservices.serving.kserve.io
EOF

Monitor all restores and wait for them to be completed:
```
velero get restores
```

3. Restore Postgres clusters

The cluster metadata has been restored, but the HM-managed Postgres clusters must be manually re-provisioned to link back to your data.

In the HM console, navigate to the databases section. You will see your original clusters listed with a status of Deleted.
Select the desired cluster and locate the Restore button. Follow the prompts to create a new cluster. During this process, the system will use your previous backups to populate the new instance.
After provisioning is complete, verify that the data matches your original state.

You can apply the same procedure to restore any Postgres clusters you had configured on a secondary location.

Note

AI components (such as the GenAI Builder UI in the Launchpad section) will automatically reappear in the HM console once the restore is initiated. Due to the large size of container images and profiles, synchronization may take some time.

4. Validate the restore

The restoration procedure is now complete. To ensure a successful recovery, we recommend checking for data integrity. Log in to the newly provisioned Postgres cluster and run a few test queries to confirm your data is current and accessible.

Tip

If you are performing this as part of a DR drill, internally document the total "Time to Restore" (TTR) for both the database and AI layers to help refine your recovery objectives (RTO).

← Prev

Configuring backups to support disaster recovery (DR)

↑ Up

Hybrid Manager backup and disaster recovery (DR)

Enabling the Migration Portal AI Copilot

Hybrid Manager disaster recovery v1.3.5

Warning

Before you start

Prerequisites

Required tools

1. Make backups available in the new HM instance

Important

Note

2. Recovery steps

Restore HM-internal databases

Restart HM services

Configure the Velero plugin

Note

Restore resources

3. Restore Postgres clusters

Note

4. Validate the restore

Tip

← Prev

↑ Up

Next →