Seamless major upgrades with blue-green deployments Innovation Release

A blue-green approach for major upgrades

Reliable, safe, and instant major upgrades are possible with distributed high-availability (DHA) clusters. The method consists of using a blue-green deployment approach for your distributed database based on different data groups:

  • The existing blue data group represents the current stable environment (for example, your production database).

  • A new green data group is created in the database cluster and upgraded to the next major version.

  • You perform comprehensive tests in your application by connecting to the green data group. (A new connection string is available.)

  • You redirect all client workloads to the green data group.

  • You delete the blue data group after the upgrade is fully validated.

Prerequisites

  • Have a DHA cluster with the old version (blue) data group on Hybrid Manager (HM) up and running in a healthy state.

  • Have the next Postgres major version available in the asset library.

General recommendations

  • Always perform a full backup before starting the upgrade process.

  • If feasible, test your upgrade process in a preproduction environment.

  • We recommend completing the upgrade process in a short period of time (for example, the same day). Running different major versions in data groups can lead to unexpected incompatibility issues.

Check the cluster status

To ensure the cluster is in a healthy state, from the cluster main page, select the Health Status tab. Then check the cluster's:

  • Raft Status. Make sure it reads OK.
  • Replication Slot Status. Make sure it reads OK.
  • Clock Skew. Make sure it reads OK.
  • Proxy Status. Make sure all the proxies are up.
  • Node Status. Make sure all the nodes are up.
  • Transaction Rate. Make sure it's within acceptable/expected parameters.

You can explore more details about each data group and node in the data group including:

  • The data group's Proxy Status
  • The data group's Node Status
  • The following details of each node in the group:
    • Number of connections
    • WAL size
    • Memory percentage used
    • Storage percentage used

You can look at the latest Replication Status details to see the replication lag between nodes.

To ensure all data is being synchronized as expected, you can use the PGD CLI to view data group state or perform deeper monitoring on the replication peers.

If any of these metrics aren't within acceptable/expected parameters, address these issues before proceeding with the upgrade process.

Edit the cluster

To begin the upgrade process, from the Quick Actions menu in the upper-right corner, select Edit Cluster. Then:

  1. Select the Cluster Settings tab and set your password again.

  2. Select the Data Groups tab, find your old version (blue) data group, and select Duplicate. This brings up a new data group below the ones you already have.

    You can change the name of the group from the default, change the Nodes architecture (two to three nodes or three to two nodes), as well as the Deployment Location, Database Type, Instance Size, Storage, Networking, Backups, and Security settings. However, in most cases using this upgrade process, you want to leave everything but possibly the Group Name the same, since you are trying to replace your old version data group with another new version data group that can handle the same load. However, if you do need to change something for a particular circumstance, you can do that at this point in the upgrade process.

  3. Select Create Data Group. If you had only an odd number of data groups, a witness group with the default name Witness Group is added to your set of data groups. The witness group serves to keep Raft consensus possible.

    This is necessary with two data groups because there could be what's known as a "split brain" scenario with an even number of data groups. In this case, half of the groups want to make one decision and the other half want to do something else. The witness group prevents such a scenario by being the tie breaker whenever this happens.

  4. Select Save to save the witness group's configuration.

    You can see both your old (blue) data groups, new (green) data group, and witness group in the Data Groups tab.

  1. On the Data Groups tab, select Save.

    The Confirm Changes pop-up lists the changes to be made to the cluster and a warning stating that enacting the changes could trigger database restarts across the cluster. While this is the case, it doesn't mean that you incur downtime from enacting the changes. At least one node in your old version data group stays active during this time.

  2. Select Confirm Changes. The changes can take quite some time depending on your underlying hardware and database size.

Verify the data in the green data group

Ensure the entire cluster is in healthy state.

From the Overview tab, you can copy the connection string for your new data group. Each data group has a different connection string exposed so that any applications connected to the existing data group aren't impacted until the process is complete. You can use the new connection string to perform tests such as validating that all data is synchronized.

Upgrade the green data group to the next major version

  1. From the Overview tab, identify the new data group. From the ellipsis menu, select Upgrade data group.

  2. From the Software Image list, select the target version. We recommend upgrading to the latest minor available for the next major version.

  3. Select Continue. You see the details of the upgrade.

  4. Select Upgrade Cluster to start the data group upgrade. Wait a few minutes until the process is complete.

Verify the upgrade

The Overview and Properties tabs show the new version for the data group. Ensure the cluster is in healthy state.

Plan your tests

We strongly recommend that you rigorously test applications and clients using the new upgraded version before switching them from your blue environment.

If you require write operations or schema changes to perform the tests, those are then replicated to the old version data group. To manage this, you can modify the replication configuration of your cluster by enabling or disabling the replication subscriptions.

For example, you can use the following SQL command to disable changes in your new upgraded version data group to your old version data groups. The SQL command, which must be executed when connected to a node in the old version data group, disables the subscriptions replicating changes from the new data group.

SELECT bdr.alter_subscription_disable(sub_name, true, true)
    FROM bdr.subscription s, bdr.node n, bdr.node_group ng
  WHERE n.node_group_id = ng.node_group_id
    AND s.source_node_id = n.node_id
    AND ng.node_group_name = '<new version data group name>';

Plan the switchover

After you've performed all tests, you're ready to switch workloads to the green data group using the new connection string. This might require changes in your applications, clients, or infrastructure services.

If you previously disabled the two-way synchronization between the new and old version data groups, you need to reenable it:

SELECT bdr.alter_subscription_enable(sub_name, true)
    FROM bdr.subscription s, bdr.node n, bdr.node_group ng
  WHERE n.node_group_id = ng.node_group_id
    AND s.source_node_id = n.node_id
    AND ng.node_group_name = '<old version data group name>';

Ensure the cluster is in healthy state.

Remove the blue data groups

The last step in the upgrade procedure is to remove any old version data groups. In the Overview tab from the cluster page, from the ellipsis menu for the old version data group you want to remove, select Delete Data Group.

Confirm the deletion. Remove any other old version data groups you have or the witness data group if you don't need it. The upgrade is then complete.

Ensure the cluster is in healthy state.

Rollback considerations

With this approach, after you delete the old version data group, no automated rollback is available. To undo the major upgrade, consider restoring the latest backup in a new data group running the desired major version and switching application and client workloads to that new data group.