Configuring backups to support disaster recovery (DR) v1.3.5
A successful disaster recovery (DR) outcome is directly tied to a robust backup strategy. While HM provides a default backup configuration, you should evaluate and implement supplemental schedules to ensure the environment aligns with your specific recovery time objective (RTO) and recovery point objective (RPO) requirements. HM backups are handled with Velero.
RTO and RPO
The ability to do any restore, and the associated RTO and RPO depend on the frequency and size of the backups.
The RTO will be determined by the time it takes to deploy a new HM instance and restore the HM control plane and data plane (Postgres database clusters) from the available backups. The RTO will be affected by the size of the Postgres clusters, the number of Postgres clusters, and the size of the HM control plane data.
The RPO will be determined by the frequency of the backups and the backup lifecycle policies. Critical data, such as the definition of the Postgres clusters, is stored as Kubernetes objects and included in the Velero backup. By default, this backup runs daily at 23:00, as defined by the default
velero-backup-kube-stateschedule. If your RPO requires more frequent backups, complement the default schedule by creating an additional custom backup schedule.
Note
While RTO and RPO targets are defined by your organization's service level requirements, the actual time and data recovery points achieved during a disaster will vary based on data volume and the manual nature of this procedure. We strongly recommend that you perform periodic disaster recovery exercises to validate that these manual steps can meet your defined RTO and RPO objectives.
Backup readiness
During installation, you configure an S3-compatible storage for your HM instance that stores:
- Internal backups (HM control plane data)
- Postgres backups (Postgres database backups or data plane data)
There is a default storage location in your bucket for these backups. However, HM supports storing backup data into a custom storage location (Managed Storage Locations ) in the same bucket. If you use custom storage locations, ensure these directories are included in your cross-region replication strategy to remain available during a regional outage.
All of this data needs to be available after a disaster. Depending on the criticality of the data and the level of disaster that you want to be able to recover from, you’ll need to replicate this data outside of the CSP region or physical data center where the HM instance resides.
Tip
When using a bucket, you can achieve replication by using cross-region replication. For example, for AWS S3 buckets, see cross-region replication.
Default backup configuration
HM employs a dual-layered backup strategy to protect both the control plane and your data plane:
Postgres database data — Managed via continuous backups. This ensures that once a cluster is restored, your data can be recovered to a specific point-in-time, limited only by your backup lifecycle policies.
HM configuration data — Managed via Velero. This captures the "blueprints" of your environment, including the definitions of your Postgres clusters. By default, this occurs daily at 23:00 via the
velero-backup-kube-stateschedule.
If your RPO requires more frequent snapshots of your cluster definitions (for example, if you create or delete clusters frequently), you can create a custom Velero schedule.
Danger
Don't modify the default schedule, as it may be overwritten by a HM software update.
Adding a custom backup schedule
The following command creates a supplemental backup schedule that runs every 6 hours in addition to the default system backups. This ensures more frequent snapshots of your metadata without interfering with the built-in velero-backup-kube-state schedule. Note that --snapshot-volumes=false is used because the actual Postgres data is already protected by the continuous backup stream in your object storage.
velero create schedule <backup-name> --schedule="0 */6 * * *" --include-namespaces='*' --include-resources='*' --snapshot-volumes=false --ttl=168h --skip-immediately=false
To check how the backup schedule was configured in the backend, run the following command:
velero get schedules <backup-name> -o yaml | yq .spec
Which should return the following output:
schedule: 0 */6 * * * skipImmediately: false template: csiSnapshotTimeout: 0s hooks: {} includedNamespaces: - '*' includedResources: - '*' itemOperationTimeout: 0s metadata: {} snapshotVolumes: false ttl: 168h0m0s useOwnerReferencesInBackup: false
Next topic
Perform a disaster recovery of your HM instance and Postgres clusters using Velero.
- On this page
- RTO and RPO
- Backup readiness
- Next topic