The Current State of Major PostgreSQL Upgrades with CloudNativePG

August 07, 2023
2 Container ships along side in Le Verdon haven, which is a subsidiary of Bordeaux port, France.

In one of our previous blog posts, Leverage a New Way to Import an Existing Postgres Database to Kubernetes, we demonstrated how to migrate a PostgreSQL 10 database that’s in RDS to a CloudNativePG managed Postgres cluster inside your Kubernetes environment. We used the database import capability, which is based on logical backup and restore (i.e. pg_dump and pg_restore).

However, this type of migration has one major limitation: any change done to the origin database after you start the process will not be included in the target database. This means that the cutover from the previous database to the new one requires you to turn off write operations on the origin’s primary until the whole migration of data is completed. Hence, we refer to these operations as offline database import and, when you are also moving to a newer major version of PostgreSQL, offline major upgrades.

The good thing about the above process is that you can:

  • Verify that the import/upgrade processes work on a new cluster, and test your applications against the newer version of Postgres
  • Effectively estimate the cutover time, primarily measured with the duration of the import process

The first item determines if you are ready to migrate to the major version.

The second item determines if an offline upgrade aligns with your organization’s expectations for the downtime generated by the cutover process on write operations to the database. If the cutover time is good, you can proceed with the offline major upgrade. If not, you can explore the possibility of relying on Postgres’ native logical replication to bring the cutover time down to almost zero.

The goal of this article is to provide an update on the current way to achieve ~0 cutover time in CloudNativePG when importing an existing database or performing a Postgres major upgrade. These are also known as online database import and major online upgrades, respectively.

A major online upgrade is essentially the same as an online database import, with the only difference being that with major online upgrades the major version of Postgres in the destination database must be higher than the one in the original database. Major online upgrades are the main topic of this article.

Major online upgrades

Major online upgrades with PostgreSQL have been around for many, many years in PostgreSQL. They were initially achieved through extensibility and triggers: the main idea was that any change in the database would be passed to a queue via a trigger, then consumers would pull the changes from the queue and apply them to the destination. A popular tool was Londiste, initially developed at Skype, which managed synchronization of data and sequences and guaranteed zero cutover time. However, a significant limitation of the trigger-based approach was the impact on the primary database, as every change would generate an event to be stored in the queue (i.e. inserts, updates, deletes) - obviously the queue was represented by a set of rotating tables in the database.

When PostgreSQL 9.4 introduced logical decoding, extensions like pglogical 2 became popular for major online upgrade procedures. Logical replication decoding was much lighter than trigger-based replication. Improvements continued (and still continue!) and PostgreSQL 10 introduced the first declarative support for logical replication through publications and subscriptions. This article will use this native capability of PostgreSQL to perform major online upgrades with CloudNativePG.

Before we continue, it is important to note that logical replication has a few limitations, in particular:

  • Data Definition Language (DDL) is not replicated
  • Data about sequences is not replicated
  • Large objects (LOBs) are not replicated

In practical terms, you are responsible for:

  • The database schema - given that DDL is not replicated, any change to the schema such as CREATE TABLE or ALTER TABLE won’t be replicated, so please make sure that you don’t change the schema during the upgrade (or if you do it, do it consciously and apply schema changes manually)
  • Updating the sequences as part of the cutover process
  • Migrating large objects, if you have any (typically binary data in Postgres is nowadays stored as bytea fields in a table, so you should not have this issue)

The database of origin

This is the database you want to move into Kubernetes through CloudNativePG. It could be everywhere. It could be the same database in RDS 10 described in "Leverage a New Way to Import an Existing Postgres Database to Kubernetes". Or it could be in the same Kubernetes cluster, managed by another operator or by CloudNativePG (e.g. you want to upgrade a Postgres 13 database to a Postgres 16 one).

The good news is that it doesn’t make any difference for us.

All we need are the following requirements:

  • A list of all the tables we want to import (valid options are also all tables in a database or all tables in a schema)
  • A user with logical replication permissions that can connect to the origin database from the new database we will create in Kubernetes, and that has permission to read the aforementioned tables
  • A logical replication PUBLICATION in the PostgreSQL database

Given that the origin database can be a generic Postgres installation, the above steps must be performed by you - most likely in an automated fashion using a script. For educational purposes, below I’m providing a YAML file for CloudNativePG that creates a PostgreSQL 13 instance called pg13 in your Kubernetes cluster, matching the above requirements:

apiVersion: v1
data:
  password: QW5ndXNZb3VuZzE5NTU=
  username: YW5ndXM=
kind: Secret
metadata:
  name: angus
---
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: pg13
spec:
  imageName: ghcr.io/cloudnative-pg/postgresql:13
  enableSuperuserAccess: false
  instances: 1

  storage:
    size: 1Gi

  bootstrap:
    initdb:
      dataChecksums: true
      walSegmentSize: 32
      postInitApplicationSQL:
      - CREATE TABLE songs (id SERIAL PRIMARY KEY, title TEXT)
      - ALTER TABLE songs OWNER TO app
      - INSERT INTO songs(title) VALUES ('Back in black')
      - CREATE PUBLICATION migrate FOR TABLE songs

  managed:
    roles:
    - name: angus
      ensure: present
      comment: User for logical replication connections
      login: true
      replication: true
      inRoles:
      - app
      passwordSecret:
        name: angus

  postgresql:
    pg_hba:
    - hostssl replication angus 10.0.0.0/8 md5

As you can see, I am using a generic PostgreSQL 13 image (imageName) from the CloudNativePG community repository. The bootstrap section contains a series of SQL statements executed by the postgres user in the application database (called app), which is by default created by the operator. These are the statements:

  • Create a table called songs with a SERIAL (producing a sequence)
  • Transfer the ownership of the table to the application user (app), which is the owner of the app database
  • Insert a randomly chosen song in the table
  • Create a publication called migrate with just the songs table (here we could have used ALL TABLES too - for details refer to "CREATE PUBLICATION")

The .spec.managed.roles section is responsible for creating the angus (the name is completely random!) that we will be using to connect from the destination database by:

  • Making sure the user is part of the app role (this is a trick to ensure that angus will be able to read the songs table in this example)
  • The secret containing the password (called angus)

The secret containing the password is at the start of the above example and has been generated using the following command:

kubectl create secret \
  --dry-run=client -o yaml generic angus \
  --from-literal=username=angus \
  --from-literal=password=AngusYoung1955

Please proceed with applying the above manifest to your Kubernetes cluster. If you don’t have one, don’t despair: you can use KinD (Kubernetes in Docker) as explained in the CloudNativePG Quickstart - and as I am doing here once I have installed the latest version of the operator.

After a few seconds, your pg13 cluster will be up:

$ kubectl get pods
NAME     READY   STATUS    RESTARTS   AGE
pg13-1   1/1     Running   0          29s

You can now verify the songs table is in the app database and is owned by the app user:

$ kubectl exec -ti pg13-1 -c postgres -- psql app -c '\d'
            List of relations
 Schema |     Name     |   Type   | Owner
--------+--------------+----------+-------
 public | songs        | table    | app
 public | songs_id_seq | sequence | app
(2 rows)

Let’s now query the songs table:

$ kubectl exec -ti pg13-1 -c postgres -- psql app -c 'SELECT * FROM songs'
 id |     title
----+---------------
  1 | Back in black
(1 row)

As expected, the angus is there and is a member of the app role:

$ kubectl exec -ti pg13-1 -c postgres -- psql app -c '\du angus'
               List of roles
    Role name    | Attributes  | Member of
-----------------+-------------+-----------
 angus | Replication | {app}

Finally, let’s check the publication:

$ kubectl exec -ti pg13-1 -c postgres -- psql app -c '\dRp+'
                            Publication migrate
  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root
----------+------------+---------+---------+---------+-----------+----------
 postgres | f          | t       | t       | t       | t         | f
Tables:
    "public.songs"

Everything is in line with our configuration file.

The database of destination

While the database of origin can vary from case to case, the destination database will certainly be using CloudNativePG, and we’ll be able to determine its definition. Our goal here is to create a new PostgreSQL 16 instance (called pg16) - NOTE: even though, at the time of writing, PostgreSQL 16 is still in beta, we deliberately use it to support more comprehensive testing prior to the actual launch expected before the end of 2023 by the PGDG.

For didactic purposes, we assume that both pg13 and pg16 are in the same namespace in the same Kubernetes cluster. However, you can make the example below generic as needed.

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: pg16
spec:
  imageName: ghcr.io/cloudnative-pg/postgresql:16
  enableSuperuserAccess: false
  instances: 1

  storage:
    size: 1Gi

  bootstrap:
    initdb:
      dataChecksums: true
      walSegmentSize: 32
      import:
        schemaOnly: true
        type: microservice
        databases:
        - app
        source:
          externalCluster: pg13
        postImportApplicationSQL:
        - CREATE SUBSCRIPTION migrate CONNECTION 'host=pg13-rw user=angus dbname=app sslmode=require password=AngusYoung1955' PUBLICATION migrate

  externalClusters:
  - name: pg13
    connectionParameters:
      # Use the correct IP or host name for the source database
      host: pg13-rw
      user: angus
      dbname: postgres
    password:
      name: angus
      key: password

As you can see, we create a new Postgres instance from scratch, and when it is up, we request to import only the schema of the app database from the pg13 cluster as defined in the externalClusters section. Specifically, we connect to the primary, identified by the pg13-rw service, using the angus user whose password is contained in the same secret as pg13.

Once the import is completed, we request the operator to create a subscription to the migrate publication on pg13. IMPORTANT: If you care about security as we do, you have certainly noticed that the SUBSCRIPTION contains a password in the connection string - issue #2386 will address this and fix it from 1.20.2.

Please apply the above manifest to your Kubernetes cluster, and you’ll see that, after the import job is completed, the pg16 cluster is up.

$ kubectl get pods
NAME     READY   STATUS    RESTARTS   AGE
pg13-1   1/1     Running   0          10m
pg16-1   1/1     Running   0          2m36s

The import phase correctly created the schema:

$ kubectl exec -ti pg16-1 -c postgres -- psql app -c '\d'
            List of relations
 Schema |     Name     |   Type   | Owner
--------+--------------+----------+-------
 public | songs        | table    | app
 public | songs_id_seq | sequence | app
(2 rows)

Let’s now query the songs table in the pg16 instance’s app database:

$ kubectl exec -ti pg16-1 -c postgres -- psql app -c 'SELECT * FROM songs'
 id |     title
----+---------------
  1 | Back in black
(1 row)

The record is there, meaning it’s been copied via logical replication. Let’s now insert a new song in the pg13 instance and check if it gets replicated in pg16:

$ kubectl exec -ti pg13-1 -c postgres \
  -- psql app -c "INSERT INTO songs(title) VALUES('Thunderstruck')"
INSERT 0 1

Followed by:

$ kubectl exec -ti pg16-1 -c postgres \
  -- psql app -c 'SELECT * FROM songs'
 id |     title
----+---------------
  1 | Back in black
  2 | Thunderstruck
(2 rows)

Replication works!

All you need to do, before you promote the pg16 database to accept writes from your application, is to properly set the current value of the songs_id_seq sequence by setting it to the value of the sequence in the pg13 through the setval() function.

Conclusion

This article demonstrates that from CloudNativePG 1.20.1, thanks to the schemaOnly option of the import capability, you can perform major online upgrades and online import of databases using a mix of declarative and imperative solutions.

Overall, the online import/major upgrade process currently looks like the following:

  • You create the destination database as outlined in the pg16 example by performing the schema-only import
  • You create the subscription as part of the post import section or, if you prefer, by running the same SQL statement in an imperative way - replication starts
  • When the origin and destination databases are synchronized, please verify the data
  • Initiate the downtime for your application
  • Update the sequences, using a script or a migration (this varies from case to case)
  • When completed, deploy the new version of the application that points to the new Postgres Cluster
  • Terminate the application downtime

Our goal is to gather feedback about the currently available approach so that we can better prioritize enhancements in CloudNativePG to automate such a process and make it as declarative as possible. These might include:

  • declarative support for management of logical replication subscriptions
  • declarative support for management of logical replication publications
  • automated process for updating the sequences as part of the cutover procedure - even though this could be part of a script
  • a bootstrap method that enhances the import facility by coordinating the above steps

In conclusion, provided you overcome the limitations of native logical replication, you can already move any PostgreSQL 10+ database inside a new CloudNativePG managed cluster having the same major version or a more recent one with almost zero cutover time. The same methodology can also be used to move to CloudNativePG from a different operator for Postgres in Kubernetes, with very little downtime.

Although the proposed approach requires additional resources for a limited amount of time, it provides the following benefits:

  • Time to test your applications on the new version, reducing the risk of post-migration issues
  • Rollback in case testing proves unsuccessful
  • More flexibility with extensions management, giving you the possibility to upgrade not only Postgres, but your core extensions too
  • Zero or almost zero cutover time

When CloudNativePG adds support for pg_upgrade (which, by the way, can already be used by imperatively leveraging the hibernation capability), you’ll have 3 ways to upgrade a PostgreSQL database:

  • Major offline upgrades via the import facility
  • Major online upgrades via the import facility and logical replication
  • In-place offline upgrades with pg_upgrade

It is up to you to decide which one suits you best, depending on your goals.

Please join the CloudNativePG community chat to share your ideas and participate in the bi-weekly development meetings to provide your feedback and help us improve the overall PostgreSQL experience in Kubernetes.

Finally, if you need professional support on the entire open source stack, we at EDB, as founders of the CloudNativePG operator and major contributors to the PostgreSQL project, are happy to help you with our Community 360 offering. Moreover, we allow you to run EDB Postgres Extended and EDB Postgres Advanced (with our Oracle compatibility layer) through our EDB Postgres for Kubernetes - also certified for Red Hat OpenShift platforms and available under Long Term Support terms. Reach out to us and enjoy running Postgres inside Kubernetes!

Featured picture: 2 Container ships along side in Le Verdon haven, which is a subsidiary of Bordeaux port, France, available under CC BY-SA 3.0.

Share this

More Blogs