EDB Docs - EDB Postgres® AI for CloudNativePG™ Cluster v1.30.0

Storage is the most critical component in a database workload. Storage must always be available, scale, perform well, and guarantee consistency and durability. The same expectations and requirements that apply to traditional environments, such as virtual machines and bare metal, are also valid in container contexts managed by Kubernetes.

Important

When it comes to dynamically provisioned storage, Kubernetes has its own specifics. These include storage classes, persistent volumes, and Persistent Volume Claims (PVCs). You need to own these concepts, on top of all the valuable knowledge you've built over the years in terms of storage for database workloads on VMs and physical servers.

There are two primary methods of access to storage:

Network – Either directly or indirectly. (Think of an NFS volume locally mounted on a host running Kubernetes.)
Local – Directly attached to the node where a pod is running. This also includes directly attached disks on bare metal installations of Kubernetes.

Network storage, which is the most common usage pattern in Kubernetes, presents the same issues of throughput and latency that you can experience in a traditional environment. These issues can be accentuated in a shared environment, where I/O contention with several applications increases the variability of performance results.

Local storage enables shared-nothing architectures, which is more suitable for high transactional and very large database (VLDB) workloads, as it guarantees higher and more predictable performance.

Warning

Before you deploy a PostgreSQL cluster with EDB Postgres® AI for CloudNativePG™ Cluster, ensure that the storage you're using is recommended for database workloads. We recommend clearly setting performance expectations by first benchmarking the storage using tools such as fio and then the database using pgbench.

Info

EDB Postgres® AI for CloudNativePG™ Cluster doesn't use StatefulSet for managing data persistence. Rather, it manages PVCs directly. If you want to know more, see Custom pod controller.

Backup and recovery

Since EDB Postgres® AI for CloudNativePG™ Cluster supports volume snapshots for both backup and recovery, we recommend that you also consider this aspect when you choose your storage solution, especially if you manage very large databases.

Important

See the Kubernetes documentation for a list of all the supported container storage interface (CSI) drivers that provide snapshot capabilities.

Benchmarking EDB Postgres® AI for CloudNativePG™ Cluster

Before deploying the database in production, we recommend that you benchmark EDB Postgres® AI for CloudNativePG™ Cluster in a controlled Kubernetes environment. Follow the guidelines in Benchmarking.

Briefly, we recommend operating at two levels:

Measuring the performance of the underlying storage using fio, with relevant metrics for database workloads such as throughput for sequential reads, sequential writes, random reads, and random writes
Measuring the performance of the database using pgbench, the default benchmarking tool distributed with PostgreSQL

Important

You must measure both the storage and database performance before putting the database into production. These results are extremely valuable not just in the planning phase (for example, capacity planning). They are also valuable in the production lifecycle, particularly in emergency situations when you don't have time to run this kind of test. Databases change and evolve over time, and so does the distribution of data, potentially affecting performance. Knowing the theoretical maximum throughput of sequential reads or writes is extremely useful in those situations. This is true especially in shared-nothing contexts, where results don't vary due to the influence of external workloads.

Know your system: benchmark it.

Encryption at rest

Encryption at rest is possible with EDB Postgres® AI for CloudNativePG™ Cluster. The operator delegates that to the underlying storage class. See the storage class for information about this important security feature.

Persistent Volume Claim (PVC)

The operator creates a PVC for each PostgreSQL instance, with the goal of storing the PGDATA. It then mounts it into each pod.

Additionally, it supports creating clusters with:

A separate PVC on which to store PostgreSQL WAL, as explained in Volume for WAL
Additional separate volumes reserved for PostgreSQL tablespaces, as explained in Tablespaces

In EDB Postgres® AI for CloudNativePG™ Cluster, the volumes attached to a single PostgreSQL instance are defined as a PVC group.

Configuration via a storage class

Important

EDB Postgres® AI for CloudNativePG™ Cluster was designed to work interchangeably with all storage classes. As usual, we recommend properly benchmarking the storage class in a controlled environment before deploying to production.

The easiest way to configure the storage for a PostgreSQL class is to request storage of a certain size, like in the following example:

apiVersion: postgresql.k8s.enterprisedb.io/v1
kind: Cluster
metadata:
  name: postgresql-storage-class
spec:
  instances: 3
  storage:
    size: 1Gi

Using the previous configuration, the generated PVCs are satisfied by the default storage class. If the target Kubernetes cluster has no default storage class, or even if you need your PVCs to be satisfied by a known storage class, you can set it into the custom resource:

apiVersion: postgresql.k8s.enterprisedb.io/v1
kind: Cluster
metadata:
  name: postgresql-storage-class
spec:
  instances: 3
  storage:
    storageClass: standard
    size: 1Gi

Configuration via a PVC template

To further customize the generated PVCs, you can provide a PVC template inside the custom resource, like in the following example:

apiVersion: postgresql.k8s.enterprisedb.io/v1
kind: Cluster
metadata:
  name: postgresql-pvc-template
spec:
  instances: 3

  storage:
    pvcTemplate:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi
      storageClassName: standard
      volumeMode: Filesystem

Volume for WAL

By default, PostgreSQL stores all its data in the so-called PGDATA (a directory). One of the core directories inside PGDATA is pg_wal, which contains the log of transactional changes that occurred in the database, in the form of segment files. (pg_wal is historically known as pg_xlog in PostgreSQL.)

Info

Normally, each segment is 16MB in size, but you can configure the size using the walSegmentSize option. This option is applied at cluster initialization time, as described in Bootstrap an empty cluster.

In most cases, having pg_wal on the same volume where PGDATA resides is fine. However, having WALs stored in a separate volume has a few benefits:

I/O performance – By storing WAL files on different storage from PGDATA, PostgreSQL can exploit parallel I/O for WAL operations (normally sequential writes) and for data files (tables and indexes for example), thus improving vertical scalability.
More reliability – By reserving dedicated disk space to WAL files, you can be sure that exhausting space on the PGDATA volume never interferes with WAL writing. This behavior ensures that your PostgreSQL primary is correctly shut down.
Finer control – You can define the amount of space dedicated to both PGDATA and pg_wal, fine tune WAL configuration and checkpoints, and even use a different storage class for cost optimization.
Better I/O monitoring – You can constantly monitor the load and disk usage on both PGDATA and pg_wal. You can also set alerts that notify you in case, for example, PGDATA requires resizing.

Write-Ahead Log (WAL)

See Reliability and the Write-Ahead Log in the PostgreSQL documentation for more information.

You can add a separate volume for WAL using the .spec.walStorage option. It follows the same rules described for the storage field and provisions a dedicated PVC. For example:

apiVersion: postgresql.k8s.enterprisedb.io/v1
kind: Cluster
metadata:
  name: separate-pgwal-volume
spec:
  instances: 3
  storage:
    size: 1Gi
  walStorage:
    size: 1Gi

Important

Removing walStorage isn't supported. Once added, a separate volume for WALs can't be removed from an existing Postgres cluster.

Volumes for tablespaces

EDB Postgres® AI for CloudNativePG™ Cluster supports declarative tablespaces. You can add one or more volumes, each dedicated to a single PostgreSQL tablespace. See Tablespaces for details.

Volume expansion

Kubernetes exposes an API allowing expanding PVCs that's enabled by default. However, it needs to be supported by the underlying StorageClass.

To check if a certain StorageClass supports volume expansion, you can read the allowVolumeExpansion field for your storage class:

$ kubectl get storageclass -o jsonpath='{$.allowVolumeExpansion}' premium-storage
true

Using the volume expansion Kubernetes feature

Given the storage class supports volume expansion, you can change the size requirement of the Cluster, and the operator applies the change to every PVC.

If the StorageClass supports online volume resizing, the change is immediately applied to the pods. If the underlying storage class doesn't support that, you must delete the pod to trigger the resize.

The best way to proceed is to delete one pod at a time, starting from replicas and waiting for each pod to be back up.

Re-creating storage

If the storage class doesn't support volume expansion, you can still regenerate your cluster on different PVCs. Allocate new PVCs with increased storage and then move the database there. This operation is feasible only when the cluster contains more than one node.

While you do that, you need to prevent the operator from changing the existing PVC by disabling the resizeInUseVolumes flag, like in the following example:

apiVersion: postgresql.k8s.enterprisedb.io/v1
kind: Cluster
metadata:
  name: postgresql-pvc-template
spec:
  instances: 3

  storage:
    storageClass: standard
    size: 1Gi
    resizeInUseVolumes: False

To move the entire cluster to a different storage area, you need to re-create all the PVCs and all the pods. Suppose you have a cluster with three replicas, like in the following example:

$ kubectl get pods
NAME                READY   STATUS    RESTARTS   AGE
cluster-example-1   1/1     Running   0          2m37s
cluster-example-2   1/1     Running   0          2m22s
cluster-example-3   1/1     Running   0          2m10s

To re-create the cluster using different PVCs, you can edit the cluster definition to disable resizeInUseVolumes. Then re-create every instance in a different PVC.

For example, re-create the storage for cluster-example-3:

$ kubectl delete pvc/cluster-example-3 pod/cluster-example-3

Important

If you created a dedicated WAL volume, both PVCs must be deleted during this process. The same procedure applies if you want to regenerate the WAL volume PVC. You can do this by also disabling resizeInUseVolumes for the .spec.walStorage section.

For example, if a PVC dedicated to WAL storage is present:

$ kubectl delete pvc/cluster-example-3 pvc/cluster-example-3-wal pod/cluster-example-3

Having done that, the operator orchestrates creating another replica with a resized PVC:

$ kubectl get pods
NAME                           READY   STATUS      RESTARTS   AGE
cluster-example-1              1/1     Running     0          5m58s
cluster-example-2              1/1     Running     0          5m43s
cluster-example-3-join-v2      0/1     Completed   0          17s
cluster-example-3              1/1     Running     0          10s

Volume reduction

Kubernetes does not provide an API to shrink a PVC, and EDB Postgres® AI for CloudNativePG™ Cluster' validating webhook rejects any attempt to decrease .spec.storage.size, .spec.walStorage.size or any tablespace storage size in .spec.tablespaces. You can still reduce the storage of a cluster, but only by recreating each instance with a smaller volume, as described below.

Warning

EDB Postgres® AI for CloudNativePG™ Cluster does not support automated volume shrinking, as it is a delicate operation that can lead to data loss if performed incorrectly. For the time being, it can only be achieved manually, through the supervised procedure described below. This procedure requires you to temporarily disable the validating webhook. While validation is disabled, the operator accepts spec changes that would normally be rejected, including unsafe or destructive ones. Proceed with caution and at your own risk, and re-enable validation as soon as possible.

Before you start, make sure the cluster's current data, WAL, and any tablespace data comfortably fit within the new, smaller sizes. If they don't, the instances recreated on smaller volumes can fail to rejoin or quickly run out of space.

To reduce the size of the persistent volumes:

Disable the validating webhook by setting the k8s.enterprisedb.io/validation: disabled annotation on the Cluster, set .spec.storage.size (and, if present, .spec.walStorage.size or tablespace storage size in .spec.tablespaces) to the new, smaller value, and increase .spec.instances by 1 to provide a spare instance during the rollout.
Re-enable validation by removing the k8s.enterprisedb.io/validation annotation (or setting it to enabled). The new, smaller size is now stored in the spec and is applied to every instance the operator recreates from this point on. Existing instances keep their current volumes; for each one, the operator logs an informational cannot decrease storage requirement message until that instance is recreated. This is expected and harmless.
Destroy one standby that still has a volume of the old size. The operator provisions a replacement instance — reusing the name of the one you destroyed, since instance serials are recycled — on the new, smaller volume:
```
kubectl-cnp destroy CLUSTER INSTANCE
```
Wait for the operator to create the replacement instance and for it to become healthy.
Repeat steps 3 and 4 for every remaining standby that still has an old-size volume.
Promote one of the newly created standbys so that the current primary — which still has an old-size volume — is demoted to a standby:
```
kubectl-cnp promote CLUSTER INSTANCE
```
Destroy the former primary (now a standby with an old-size volume) so the operator provisions its replacement on the new, smaller volume, and wait until all instances are healthy.
Decrease .spec.instances back to its original value.

Static provisioning of persistent volumes

EDB Postgres® AI for CloudNativePG™ Cluster was designed to work with dynamic volume provisioning. This capability allows storage volumes to be created on demand when requested by users by way of storage classes and PVC templates. See Re-creating storage.

However, in some cases, Kubernetes administrators prefer to manually create storage volumes and then create the related PersistentVolume objects for their representation inside the Kubernetes cluster. This is also known as pre-provisioning of volumes.

Important

We recommend that you avoid pre-provisioning volumes, as it has an effect on the high availability and self-healing capabilities of the operator. It breaks the fully declarative model on which EDB Postgres® AI for CloudNativePG™ Cluster was built.

To use a pre-provisioned volume in EDB Postgres® AI for CloudNativePG™ Cluster:

Manually create the volume outside Kubernetes.
Create the PersistentVolume object to match this volume using the correct parameters as required by the actual CSI driver (that is, volumeHandle, fsType, storageClassName, and so on).
Create the Postgres Cluster using, for each storage section, a coherent pvcTemplate section that can help Kubernetes match the PersistentVolume and enable EDB Postgres® AI for CloudNativePG™ Cluster to create the needed PersistentVolumeClaim.

Warning

With static provisioning, it's your responsibility to ensure that Postgres pods can be correctly scheduled by Kubernetes where a pre-provisioned volume exists. (The scheduling configuration is based on the affinity rules of your cluster.) Make sure you check for any pods stuck in Pending after you deploy the cluster. If the condition persists, investigate why it's happening.

Block storage considerations (Ceph/Longhorn)

Most block storage solutions in Kubernetes, such as Longhorn and Ceph, recommend having multiple replicas of a volume to enhance resiliency. This approach works well for workloads that lack built-in resiliency.

However, EDB Postgres® AI for CloudNativePG™ Cluster integrates this resiliency directly into the Postgres Cluster through the number of instances and the persistent volumes attached to them, as explained in "Synchronizing the state".

As a result, defining additional replicas at the storage level can lead to write amplification, unnecessarily increasing disk I/O and space usage.

For EDB Postgres® AI for CloudNativePG™ Cluster usage, consider reducing the number of replicas at the block storage level to one, while ensuring that no single point of failure (SPoF) exists at the storage level for the entire Cluster resource. This typically means ensuring that a single storage host—and ultimately, a physical disk—does not host blocks from different instances of the same Cluster, in alignment with the broader shared-nothing architecture principle.

In Longhorn, you can mitigate this risk by enabling strict-local data locality when creating a custom storage class. Detailed instructions for creating a volume with strict-local data locality are available here. This setting ensures that a pod’s data volume resides on the same node as the pod itself.

Additionally, your Postgres Cluster should have pod anti-affinity rules in place to ensure that the operator deploys pods across different nodes, allowing Longhorn to place the data volumes on the corresponding hosts. If needed, you can manually relocate volumes in Longhorn by temporarily setting the volume replica count to 2, reducing it afterward, and then removing the old replica. If a host becomes corrupted, you can use the cnp plugin to destroy the affected instance. EDB Postgres® AI for CloudNativePG™ Cluster will then recreate the instance on another host and replicate the data.

In Ceph, this can be configured through CRUSH rules. The documentation for configuring CRUSH rules is available here. These rules aim to ensure one volume per pod per node. You can also relocate volumes by importing them into a different pool.

Storage v1.30.0

Important

Warning

Info

Backup and recovery

Important

Benchmarking EDB Postgres® AI for CloudNativePG™ Cluster

Important

Encryption at rest

Persistent Volume Claim (PVC)

Configuration via a storage class

Important

Configuration via a PVC template

Volume for WAL

Info

Write-Ahead Log (WAL)

Important

Volumes for tablespaces

Volume expansion

Using the volume expansion Kubernetes feature

Re-creating storage

Important

Volume reduction

Warning

Static provisioning of persistent volumes

Important

Warning

Block storage considerations (Ceph/Longhorn)

← Prev

↑ Up

Next →