Seamless major upgrades with blue-green deployments v1.3.2

A blue-green approach for major upgrades

Reliable, safe and instant major upgrades are possible with Distributed High Availability clusters. The method consists of using a blue/green deployment approach for your distributed database based on different data groups:

  • The existing blue data group represents the current stable environment (e.g. your Production database).

  • A new green data group is created in the database cluster and upgraded to the next major version.

  • You perform comprehensive tests in your application by connecting to the green data group (a new connection string is available).

  • You redirect all client workloads to the green data group.

  • Finally, delete the blue data group once the upgrade is fully validated.

Prerequisites

  • Have a Distributed High Availability cluster with the old version blue data group on Hybrid Manager up and running in a healthy state.

  • Have the next PG major version available in Asset Library.

General recommendations

  • Always perform a full backup before starting the upgrade process.

  • If feasible, test your upgrade process in a pre-production environment.

  • We recommend completing the upgrade process in a short period of time (e.g. same day). Running different major versions in data groups can lead to unexpected incompatibility issues.

Check the cluster status

To ensure the cluster is in a healthy state, from the cluster main page, select the Health Status tab. Now check the cluster's:

  • Raft Status: to make sure it is reading "OK"
  • Replication Slot Status: to make sure it is reading "OK"
  • Clock Skew: to make sure it is reading "OK"
  • Proxy Status: to make sure all the proxies are up
  • Node Status: to make sure all the nodes are up
  • Transaction Rate: to make sure it is within acceptable/expected parameters

You can further explore details about each data group and node in the data group including:

  • The data group's Proxy Status
  • The data group's Node Status
  • The following details of each node in the group:
    • number of Connections
    • Wal Size
    • Memory percentage used
    • Storage percentage used

You can look at the latest Replication Status details to see the replication lag between nodes.

Furthermore, you can use the PGD CLI to view data group state or perform deeper monitoring on the replication peers to ensure all data is being synchronized as expected.

If any of these metrics are not within acceptable/expected parameters, you should address these issues before moving on with the upgrade process.

Edit the cluster

Start by using the Quick Actions drop-down menu in the upper-right corner of the page and select Edit Cluster to begin the upgrade process:

  • Next, select the Cluster Settings tab and set your password again.

  • Then, select the Data Groups tab, find your old version “blue” data group, and select the Duplicate button. This brings up a new data group below the one(s) you already have.

  • You can change the name of the group from the default, change the Nodes architecture (two to three nodes or three to two nodes), as well as the Deployment Location, Database Type, Instance Size, Storage, Networking, Backups, and Security settings. In most cases using this upgrade process, you want to leave everything but possibly the Group Name the same, since you are trying to replace your old version data group with another new version data group that can handle the same load. However, if you do need to change something for a particular circumstance, you can do that at this point in the upgrade process.

  • If you only had an odd number of data groups up until this point, as soon as you select the Create Data Group button, a witness group with the default name Witness Group is added to your set of data groups. The witness group serves to keep raft consensus possible. The reason this is necessary with two data groups is that there could be what is known as a "split brain" scenario with an even number of data groups, where half of the groups want to go make one decision and the other half want to do something else. The witness group prevents such a scenario by being the tie breaker whenever this happens.

  • Select the Save button to save the witness group's configuration.

  • Now you can see both your old blue data group(s), new green data group, and witness group in the Data Groups tab. Select the Save button on this screen and you will be presented with a Confirm Changes pop-up that lists what changes to the cluster are to be made. In this pop-up is also a warning stating that enacting the changes could trigger database restarts across the cluster. While this is the case, it does not mean that you incur downtime from enacting the changes. At least one node in your old version data group stays active during this time.

  • Select the Confirm Changes button in the pop-up for the system to actually begin enacting the cluster changes you have requested. This may take quite some time depending on your underlying hardware and database size.

Verify the data in the green data group

Ensure the entire cluster is in healthy state.

From the Overview tab, you can copy the connection string for your new data group. Each data group will have a different connection string exposed so that any application connected to the existing data group won't be impacted until the process is complete. You can use the new connection string to perform tests such as validating that all data is synchronized.

Upgrade the green data group to the next major version

  • From the Overview tab, identify the new data group, click on Upgrade data group under the three-dots actions button.

  • Then, select the target version from the "Software Image” dropdown list. We recommend upgrading to the latest minor available for the next major version. Click “Continue” to see the details of the upgrade.

  • Select the Upgrade Cluster button in the pop-up for the system to actually start the data group upgrade. Wait a few minutes until the process is complete.

Verify the upgrade

Both, the Overview and Properties tabs will show the new version for the data group. Ensure the cluster is in healthy state.

Plan your tests

It is strongly recommended that you test applications and clients using the new upgraded version rigorously, before ever switching them from your “blue” environment.

If you require write operations or schema changes to perform the tests, those are then automatically replicated to the old version data group. To manage this, you can modify the replication configuration of your cluster by enabling or disabling the replication subscriptions.

For example, the following SQL command can be used to disable changes in your new upgraded version data group to your old version data group(s). The SQL command, which must be executed when connected to a node in the old version data group, disables the subscription(s) replicating changes from the new data group.

SELECT bdr.alter_subscription_disable(sub_name, true, true)
    FROM bdr.subscription s, bdr.node n, bdr.node_group ng
  WHERE n.node_group_id = ng.node_group_id
    AND s.source_node_id = n.node_id
    AND ng.node_group_name = '<new version data group name>';

Plan the switchover

Once all tests are performed, you are ready to switch workloads to the green data group using the new connection string. This might require changes in your applications, clients, or infrastructure services.

If you previously disabled the two-way synchronization between the new and old version data group(s), you'll need to re-enable it with the following command:

SELECT bdr.alter_subscription_enable(sub_name, true)
    FROM bdr.subscription s, bdr.node n, bdr.node_group ng
  WHERE n.node_group_id = ng.node_group_id
    AND s.source_node_id = n.node_id
    AND ng.node_group_name = '<old version data group name>';

Ensure the cluster is in healthy state.

Remove the blue data group(s)

The last step in the upgrade procedure is to remove the old version data group(s). In the Overview tab from the cluster page, select the triple dot drop-down menu for the old version data group you wish to remove and select the Delete Data Group action.

You are then presented with a pop-up to confirm the deletion. Confirm the deletion, repeat the removal process for any other old version data groups you have or the Witness data group if no longer necessary, and the upgrade is complete.

Ensure the cluster is in healthy state.

Rollback considerations

There's no automated rollback available with this approach once the old version data group is deleted. To undo the major upgrade, consider restoring the latest backup in a new data group running the desired major version and switching application and client workloads to that new data group.