Primary active group, DR group v6.3.1
The primary active group, DR group pattern pairs a full-capacity primary site with two data nodes and a reduced-capacity disaster recovery (DR) site with one node. Connection Manager handles automatic failover between the two primary nodes, keeping the primary site available through single node failures without operator intervention.
The DR site receives asynchronous replication and remains on standby for failover. During DR failover, applications switch to the single DR node at reduced capacity until the primary is restored.
The cluster transitions through four states depending on which nodes are available.
- Steady state: two nodes active in the primary, one node receiving replication in the DR location.
- Primary node failure: automatic failover to the second primary node in seconds.
- Primary data center (DC) failure: manual or orchestrated failover to the DR node in minutes.
- Recovery: rebuild the primary DC from the DR node, then fail back.
Diagram legend
Symbols represent node types, group containers, and connection types.
When to use this pattern
This pattern suits organizations needing disaster recovery with budget constraints for DR infrastructure. The primary site has full HA, and the DR site provides data protection and recovery capability at lower cost. It's common in deployments where full-capacity active-active isn't justified.
The advantages of this pattern include the following:
- Cheapest setup with DR, protecting data in a separate location.
- Full HA within the primary data center, surviving single node failure.
- DR node can be used for read-only reporting and analytics.
- Simpler than full multi-DC active-active.
There are some limitations to keep in mind:
- DR capacity is limited to a single node, with no HA in the DR site.
- DR failover requires manual intervention or orchestration.
- Active-active workloads aren't supported.
- The primary group can have consistency issues on cascading failures or recovery, because replication degrades to asynchronous to preserve high availability.
Recommended commit scopes for this pattern
- Adaptive protect: synchronous replication to both nodes in the primary group, degrading to asynchronous replication for availability on node loss. The most commonly recommended scope for production use.
- Local protect: asynchronous replication with durability only for the local node. Not recommended for production use.
Commit scopes make different trade-offs across performance, durability, consistency, and scalability.