Consensus layer considerations v4
HARP is designed so that it can work with different implementations of consensus layer, also known as Distributed Control Systems (DCS).
Currently the following DCS implementations are supported:
This information is specific to HARP's interaction with the supported DCS implementations.
For the purpose of maintaining a voting quorum, BDR Logical Standby nodes and BDR Subscriber-Only nodes don't participate in consensus communications in a EDB Postgres Distributed cluster. Don't count these in the total node list to fulfill DCS quorum requirements.
Clusters of any architecture require at least n/2 + 1 nodes to maintain consensus via a voting quorum. Thus a three-node cluster can tolerate the outage of a single node, a five-node cluster can tolerate a two-node outage, and so on. If consensus is ever lost, HARP becomes inoperable because the DCS prevents it from deterministically identifying the node that is the lead master in a particular location.
As a result, whichever DCS is chosen, more than half of the nodes must always be available cluster-wide. This can become a non-trivial element when distributing DCS nodes among two or more data centers. A network partition prevents quorum in any location that can't maintain a voting majority, and thus HARP stops working.
Thus an odd-number of nodes (with a minimum of three) is crucial when building the consensus layer. An ideal case distributes nodes across a minimum of three independent locations to prevent a single network partition from disrupting consensus.
One example configuration is to designate two DCS nodes in two data centers coinciding with the primary BDR nodes, and a fifth DCS node (such as a BDR witness) elsewhere. Using such a design, a network partition between the two BDR data centers doesn't disrupt consensus thanks to the independently located node.
HARP assumes one lead master per configured location. Normally each
location is specified in HARP using the
location configuration setting.
By creating a separate DCS cluster per location, you can emulate
this behavior independently of HARP.
To accomplish this, configure HARP in
config.yml to use a different
DCS connection target per desired Location.
HARP nodes in DC-A use something like this:
While DC-B uses different hostnames corresponding to nodes in its canonical location:
There's no DCS communication between different data centers in this design, and thus a network partition between them doesn't affect HARP operation. A consequence of this is that HARP is completely unaware of nodes in the other location, and each location operates essentially as a separate HARP cluster.
This isn't possible when using BDR as the DCS, as BDR maintains a consensus layer across all participant nodes.
A possible drawback to this approach is that
harpctl can't interact
with nodes outside of the current location. It's impossible to obtain
node information, get or set the lead master, or perform any other operation that
targets the other location. Essentially this organization renders the
--location parameter to
These considerations are integrated into TPAexec as well. When deploying a cluster using etcd, it constructs a separate DCS cluster per location to facilitate high availability in favor of strict consistency.
Thus this configuration example groups any DCS nodes assigned to the
first location together, and the
second location is a separate cluster:
To override this behavior,
harp_location implicitly to force a particular grouping.
Thus this example returns all etcd nodes into a single cohesive DCS layer:
harp_location override might also be necessary to favor specific node
groupings when using cloud providers such as Amazon that favor availability
zones in regions over traditional data centers.