Operator Capability LevelsFeedback
This section provides a summary of the capabilities implemented by Cloud Native PostgreSQL, classified using the "Operator SDK definition of Capability Levels" framework.
Each capability level is associated with a certain set of management features the operator offers:
- Basic Install
- Seamless Upgrades
- Full Lifecycle
- Deep Insights
- Auto Pilot
We consider this framework as a guide for future work and implementations in the operator.
Capability level 1 involves installation and configuration of the operator. This category includes usability and user experience enhancements, such as improvements in how users interact with the operator and a PostgreSQL cluster configuration.
We consider Information Security part of this level.
The operator is installed in a declarative way using a Kubernetes manifest
which defines 3
A PostgreSQL cluster (operand) is defined using the
Cluster custom resource
in a fully declarative way. The PostgreSQL version is determined by the
operand container image defined in the CR, which is automatically fetched
from the requested registry. When deploying an operand, the operator also
automatically creates the following resources:
The operator is designed to support any operand container image with
By default, the operator uses the latest available minor
version of the latest stable major version supported by the PostgreSQL
Community and published on Quay.io by EnterpriseDB.
You can use any compatible image of PostgreSQL supporting the
primary/standby architecture directly by setting the
attribute in the CR. The operator also supports
to access private container registries.
Instead of relying on an external tool such as Patroni or Stolon to
coordinate PostgreSQL instances in the Kubernetes cluster pods, the operator
injects the operator executable inside each pod, in a file named
/controller/manager. The application is used to control the underlying
PostgreSQL instance and to reconcile the pod status with the instance itself
based on the PostgreSQL cluster topology. The instance manager also starts a
web server that is invoked by the
kubelet for probes. Unix signals invoked
kubelet are filtered by the instance manager and, where appropriate,
forwarded to the
postgres process for fast and controlled reactions to
external events. The instance manager is written in Go and has no external
Storage is a critical component in a database workload. Taking advantage of
Kubernetes native capabilities and resources in terms of storage, the
operator gives users enough flexibility to choose the right storage for their
workload requirements, based on what the underlying Kubernetes environment
can offer. This implies choosing a particular storage class in
a public cloud environment or fine-tuning the generated PVC through a
PVC template in the CR's
The operator automatically detects replicas in a cluster
through a single parameter called
instances. If set to
1, the cluster
comprises a single primary PostgreSQL instance with no replica. If higher
1, the operator manages
instances -1 replicas, including high
availability through automated failover and rolling updates through
The operator is designed to manage a PostgreSQL cluster with a single
database. The operator transparently manages access to the database through
three Kubernetes services automatically provisioned and managed for read-write,
read, and read-only workloads.
Using the convention over configuration approach, the operator creates a
app, by default owned by a regular Postgres user with the
same name. Both the database name and the user name can be specified if
Although no configuration is required to run the cluster, users can customize
both PostgreSQL run-time configuration and PostgreSQL Host-Based
Authentication rules in the
postgresql section of the CR.
For InfoSec requirements, the operator does not need privileged mode for the execution of containers and access to volumes both in the operator and in the operand.
The operator supports basic pod affinity/anti-affinity rules to deploy PostgreSQL
pods on different nodes, based on the selected
topologyKey (for example
zone). Additionally, it supports node affinity through the
configuration attribute, as expected by Kubernetes.
The operator comes with support for license keys, with the possibility to programmatically define a default behavior in case of the absence of a key. Cloud Native PostgreSQL has been programmed to create an implicit 30-day trial license for every deployed cluster. License keys are signed strings that the operator can verify using an asymmetric key technique. The content is a JSON object that includes the type, the product, the expiration date, and, if required, the cluster identifiers (namespace and name), the number of instances, the credentials to be used as a secret by the operator to pull down an image from a protected container registry. Beyond the expiration date, the operator will stop any reconciliation process until the license key is restored.
The operator continuously updates the status section of the CR with the
observed status of the cluster. The entire PostgreSQL cluster status is
continuously monitored by the instance manager running in each pod: the
instance manager is responsible for applying the required changes to the
controlled PostgreSQL instance to converge to the required status of
the cluster (for example: if the cluster status reports that pod
-1 is the
-1 needs to promote itself while the other pods need to follow
-1). The same status is used by Kubernetes client applications to
provide details, including the
cnp plugin for
kubectl and the OpenShift dashboard.
The operator automatically creates a certification authority for itself. It creates and signs with the operator certification authority a leaf certificate to be used by the webhook server, to ensure safe communication between the Kubernetes API Server and the operator itself.
The operator automatically creates a certification authority for every PostgreSQL
cluster, which is used to issue and renew TLS certificates for the authentication
of streaming replication standby servers and applications (instead of passwords).
The operator will use the Certification Authority to sign every cluster
certification authority. Certificates can be issued with the
The operator transparently and natively supports TLS/SSL connections to encrypt client/server communications for increased security using the cluster's certification authority.
The operator relies on TLS client certificate authentication to authorize streaming replication connections from the standby servers, instead of relying on a password (and therefore a secret).
The operator enables users to apply changes to the
Cluster resource YAML
section of the PostgreSQL configuration and makes sure that all instances
are properly reloaded or restarted, depending on the configuration option.
Current limitations: changes with
ALTER SYSTEM are not detected, meaning
that the cluster state is not enforced; proper restart order is not implemented
with hot standby sensitive parameters
The operator can be installed through a Kubernetes manifest via
apply, to be used in a traditional Kubernetes installation in public
and private cloud environments. Additionally, it can be deployed through
the Operator Lifecycle Manager (OLM) from OperatorHub.io and the OpenShift
Container Platform by RedHat.
The operator supports the convention over configuration paradigm, deciding
standard default values while allowing users to override them and customize
them. You can specify a deployment of a PostgreSQL cluster using
Cluster CRD in a couple of YAML code lines.
Capability level 2 is about enabling updates of the operator and the actual workload, in our case PostgreSQL servers. This includes PostgreSQL minor release updates (security and bug fixes normally) as well as major online upgrades.
You can upgrade the operator seamlessly as a new deployment. A change in the operator does not require a change in the operand - thanks to the instance manager's injection. The operator can manage older versions of the operand.
The operand can be upgraded using a declarative configuration approach as
part of changing the CR and, in particular, the
imageName parameter. The
operator prevents major upgrades of PostgreSQL while making it possible to go
in both directions in terms of minor PostgreSQL releases within a major
version (enabling updates and rollbacks).
In the presence of standby servers, the operator performs rolling updates
starting from the replicas by dropping the existing pod and creating a new
one with the new requested operand image that reuses the underlying storage.
Depending on the value of the
primaryUpdateStrategy, the operator proceeds
with a switchover before updating the former primary (
unsupervised) or waits
for the user to manually issue the switchover procedure (
supervised) via the
cnp plugin for
Which setting to use depends on the business requirements as the operation
might generate some downtime for the applications, from a few seconds to
minutes based on the actual database workload.
At any time, convey the cluster's high availability status, for example,
Failover in progress,
Switchover in progress,
Upgrade in progress, or
Capability level 3 requires the operator to manage aspects of business continuity and scalability. Disaster recovery is a business continuity component that requires that both backup and recovery of a database work correctly. While as a starting point, the goal is to achieve RPO < 5 minutes, the long term goal is to implement RPO=0 backup solutions. High Availability is the other important component of business continuity that, through PostgreSQL native physical replication and hot standby replicas, allows the operator to perform failover and switchover operations. This area includes enhancements in:
- control of PostgreSQL physical replication, such as synchronous replication, (cascading) replication clusters, and so on;
- connection pooling, to improve performance and control through a connection pooling layer with pgBouncer.
The operator has been designed to provide application-level backups using PostgreSQL’s native continuous backup technology based on physical base backups and continuous WAL archiving. Specifically, the operator currently supports only backups on AWS S3 or S3-compatible object stores and gateways like MinIO.
WAL archiving and base backups are defined at the cluster level, declaratively,
backup parameter in the cluster definition, by specifying
an S3 protocol destination URL (for example, to point to a specific folder in
an AWS S3 bucket) and, optionally, a generic endpoint URL. WAL archiving,
a prerequisite for continuous backup, does not require any further
action from the user: the operator will automatically and transparently set
archive_command to rely on
barman-cloud-wal-archive to ship WAL
files to the defined endpoint. Users can decide the compression algorithm.
You can define base backups in two ways: on-demand (through the
custom resource definition) or scheduled (through the
customer resource definition, using a cron-like syntax). They both rely on
barman-cloud-backup for the job (distributed as part of the application
container image) to relay backups in the same endpoint, alongside WAL files.
barman-cloud-backup are distributed in
the application container image under GNU GPL 3 terms.
The operator enables users to bootstrap a new cluster (with its settings)
starting from an existing and accessible backup taken using
barman-cloud-backup. Once the bootstrap process is completed, the operator
initiates the instance in recovery mode and replays all available WAL files
from the specified archive, exiting recovery and starting as a primary.
Subsequently, the operator will clone the requested number of standby instances
from the primary.
The operator enables users to create a new PostgreSQL cluster by recovering an existing backup to a specific point-in-time, defined with a timestamp, a label or a transaction ID. This capability is built on top of the full restore one and supports all the options available in PostgreSQL for PITR.
Achieve Zero Data Loss (RPO=0) in your local High Availability Cloud Native PostgreSQL cluster through quorum based synchronous replication support. The operator provides two configuration options that control the minimum and maximum number of expected synchronous standby replicas available at any time. The operator will react accordingly, based on the number of available and ready PostgreSQL instances in the cluster, through the following formula:
The operator defines liveness and readiness probes for the Postgres
Containers that are then invoked by the kubelet. They are mapped respectively
/readyz endpoints of the web server managed
directly by the instance manager. They both use Go to connect to the cluster
and issue a simple query (
;) to verify that the server is ready to accept
The operator supports rolling deployments to minimize the downtime and, if a PostgreSQL cluster is exposed publicly, the Service will load-balance the read-only traffic only to available pods during the initialization or the update.
The operator allows users to scale up and down the number of instances in a
PostgreSQL cluster. New replicas are automatically started up from the
primary server and will participate in the cluster's HA infrastructure.
The CRD declares a "scale" subresource that allows the user to use the
kubectl scale command.
The operator creates a
PodDisruptionBudget resource to limit the number of
concurrent disruptions to one. This configuration prevents the maintenance
operation from deleting all the pods in a cluster, allowing the specified
number of instances to be created.
The PodDisruptionBudget will be applied during the node draining operation,
preventing any disruption of the cluster service.
While this strategy is correct for Kubernetes Clusters where
storage is shared among all the worker nodes, it may not be the best solution
for clusters using Local Storage or for clusters installed in a private
cloud. The operator allows users to specify a Maintenance Window and
configure the reaction to any underlying node eviction. The
in the maintenance window section enables to specify the strategy to be used:
allocate new storage in a different PVC for the evicted instance or wait
for the underlying node to be available again.
When the operator needs to create a pod that has been deleted by the user or
has been evicted by a Kubernetes maintenance operation, it reuses the
PersistentVolumeClaim if available, avoiding the need
to re-clone the data from the primary.
The operator allows administrators to control and manage resource usage by
the cluster's pods, through the
resources section of the manifest. In
limits values can be set for both CPU and RAM.
Capability level 4 is about observability: in particular, monitoring, alerting, trending, log processing. This might involve the use of external tools such as Prometheus, Grafana, Fluent Bit, as well as extensions in the PostgreSQL engine for the output of error logs directly in JSON format.
The instance manager provides a pluggable framework and, via its own
web server, exposes an endpoint to export metrics for the
Prometheus monitoring and alerting tool.
The operator supports custom monitoring queries defined as
Secret objects using a syntax that is compatible with
postgres_exporter for Prometheus.
Record major events as expected by the Kubernetes API, such as creating resources,
removing nodes, upgrading, and so on. Events can be displayed through
kubectl describe and
kubectl get events command.
Capability level 5 is focused on automated scaling, healing and tuning - through the discovery of anomalies and insights emerged from the observability layer.
In case of detected failure on the primary, the operator will change the
status of the cluster by setting the most aligned replica as the new target
primary. As a consequence, the instance manager in each alive pod will
initiate the required procedures to align itself with the requested status of
the cluster, by either becoming the new primary or by following it.
In case the former primary comes back up, the same mechanism will avoid a
split-brain by preventing applications from reaching it, running
the server and restarting it as a standby.
In case the pod hosting a standby has been removed, the operator initiates the procedure to recreate a standby server.