Skip to content
Webinar Series: Oracle Migration • Sept 13, 20 & 27 • Register Now

What Is Database-as-a-Service (DBaaS)? Tutorial on How to Use Managed Database Service for PostgreSQL

Marc Linster5/16/2022

1 What is Database as a Service?    
2 How Does Database-as-a-Service Work?    
2.1 The Shared Responsibility Model    
2.2 The Control Plane    
3 How is Database-as-a-Service different from On-Premises Database Management Systems?    
4 Is DBaaS considered to be SaaS, PaaS, or IaaS?    
5 Comparing Virtual Machines, Kubernetes, and DBaaS    
6 What are the advantages and disadvantages of DBaaS?    
7 What are the DBaaS service variations?    
8 What is PostgreSQL as a Service?    
9 Common risks when moving to the cloud    

9.1 Support risk    
9.2 Service risk    
9.3 Technology stagnation risk    
9.4 Cost risk    
9.5 Lock-in risk    
9.6 Planning for risk    
10 DBaaS Tools    
10.1 Development Tools    
10.2 Database Compatibility    
10.3 APIs and CLIs    
10.3.1 BigAnimal API Cluster Management    
10.3.1.1 Authentication    
10.3.1.2 Creating Clusters    
10.3.1.3 Listing Clusters    
10.3.1.4 Deleting Clusters    
10.4 Continuous Integration/Continuous Delivery    
10.4.1 Creating a Project in Liquibase Hub    
10.4.2 Downloading your Liquibase jar file    
10.4.3 Creating your liquibase.properties file    
10.4.4 Generating Changelog    
10.4.5 Creating the changeset file    
10.4.6 Update dbchangelog.xml file    
10.4.7 Registering the Changelog    
10.4.8 Pushing changes to the DB    
10.4.9 Rollback changes to the DB    
10.5 Infrastructure as Code    
11 Security and compliance    
11.1 Securing Postgres in the Cloud    
11.1.1 Connecting    
11.1.2 Transport Encryption    
11.1.3 Authentication    
11.1.4 Roles    
11.1.4.1 Role attributes    
11.1.4.2 Password complexity    
11.1.4.3 Password profiles    
11.1.4.4 SET ROLE    
11.1.4.5 Monitoring Roles    
11.1.5 Data Access Control    
11.1.5.1 GRANT & REVOKE   
11.1.5.2 RLS    
11.1.5.3 Views    
11.1.5.4 Security barriers    
11.1.5.5 Security Definer Functions    
11.1.5.6 Data redaction    
11.1.6 Encryption    
11.1.6.1 pgcrypto    
11.1.6.1.1 Hashing in pgcrypto    
11.1.6.1.2 Encryption in pgcrypto    
11.2 Compliance considerations    
11.2.1 General considerations: SOC 1, SOC 2, SOC 3, and ISO 27001    
11.2.2 Industry or use case specific considerations: GDPR DPA, HIPAA BAA, PCI DSS, Fedramp    
11.3 Is it safe to use Postgres in the cloud?    
12 How to choose the best DBaaS for your needs?    
13 Migrating to the cloud    

13.1 What is being moved?    
13.2 How is it currently hosted?    
13.3 Migrating databases to the cloud: The moving process    
13.4 Hardware Considerations    
14 DBaaS is not an all-or-nothing solution    
15 Conclusion    
16 Acronyms and Abbreviations    

 

1 What is Database as a Service?

According to an IDC Survey Spotlight, Experience in Migrating Databases to the Cloud, 63% of enterprises are actively migrating their databases to the cloud, and another 29% are considering doing so within the next three years. Database as a Service (DBaaS) is one of the preferred alternatives for cloud migration. According to a 2022 EDB survey, 50% of respondents were planning to use a DBaaS for the Postgres cloud migration; 39% were considering containers and Kubernetes, and 11% were planning to migrate to virtual machines. This article will focus on DBaaS as the leading cloud migration alternative.

DBaaS is a cloud computing service model that allows users and organizations to create, manage, maintain, and query databases in the cloud without the need for physical hardware and software installation, and without assuming many operational tasks, such as monitoring, backup and recovery, and patching. These operational tasks are assumed by the Cloud Database Operator (CDO). The CDO also maintains the hardware and software, and guarantees a committed service level availability (SLA).

Users are completely focused on achieving your business priorities and optimizing the database for their applications, instead of investing resources into managing, monitoring, high availability and backups, security and compliance, OS and database patching, or network configuration.

Other labels often used for DBaaS include Database Platform as a Service (DBPaaS) or fully managed database. DBaaS adoption is growing rapidly, and all major cloud platforms offer DBaaS solutions, e.g., AWS RDS, Azure Database, or Google Cloud SQL. A new generation of DBaaS is provided by specialized database vendors, such as MongoDB’s Atlas, EDB’s BigAnimal, or Crunchy Data’s Crunchy Bridge

While the DBaaS is obviously focused on the database, understanding the ‘as a Service’ aspect is essential. The National Institute of Standards and Technology (NIST) Definition of Cloud Computing (NIST SP 800-144) is a good starting point as it lists five essential characteristics of cloud computing service:

  • On-demand self-service - automated provisioning without human intervention  
  • Broad network access - access from anywhere with any number of platforms 
  • Resource pooling - multi-tenant utilization of hardware and software (with boundaries) to allow for greater flexibility and greater utilization of resources vs idle servers sitting in a datacenter 
  • Rapid elasticity - seamless ability to rapidly scale and release provisioned resources based on demand  
  • Measured service - resource utilization is monitored, controlled, and reported by the cloud service provider, which provides transparency for both the provider and consumer of the utilized service, and the consumer is billed for what is used 

These requirements align well with the needs of most businesses when they think about a move to the cloud:

  • Focus on value-add activities, as opposed to ‘keeping the lights on’ grunt work
  • Support for an agile model where resources can be made available quickly
  • Pay for what you need

 2 How Does Database-as-a-Service Work?

Two components are essential to the understanding of how a DBaaS works:

  • The shared responsibility model outlines the obligations of the CDO and what activities are assumed by the user
  • The control plane defines how the CDO’s management tools interact with the user’s databases to make sure the databases are available, patched, backed up, and monitored

2.1 The Shared Responsibility Model

The concept of shared responsibility dictates that both the CDO and the end-user have some accountability to ensure that the DBaaS is deployed, configured, maintained, and that it functions as intended.  The cloud provider takes control of various maintenance tasks such as database patching, hardware upgrades, high availability, and backup/restore; whereas, the consumer/end-user retains responsibility for query performance, password management, and resource allocation/selection (determining the hardware that is suitable for their expected workload).  The division of the accountability/responsibility can vary. Different cloud offerings may draw different boundaries but the concept of “shared” responsibility by both parties is key to the DBaaS.

For example, in most DBaaS the provider is responsible for hardware maintenance, networking, OS installation, OS patching, database software installation and patching, monitoring and availability of ping (network access), pipe (network performance), and power. They are also responsible for adherence to key compliance standards such as SOC 1 and 2, PCI, and if applicable, HIPAA, Fedramp, etc. The user is responsible for defining and maintaining databases, tables, queries, indexes, stored procedures, and other data elements.

2.2 The Control Plane

The control plane connects the domain of the user with the domain of the CDO. It allows the CDO’s Service Reliability Engineers (SRE) to monitor and manage the user’s databases and the underlying cloud resources, such as compute, storage, and networking. The control plane also securely executes commands that the user has input into the DBaaS GUI or API.

biganimal service reliability engineering

The Control Plane of EDB’s BigAnimal DBaaS uses EDB’s Kubernetes operators

The control plane is essential to the DBaaS’ operation, as it implements the shared responsibility model, and it is at the heart of the key ‘as a Service’ characteristics of the DBaaS (on demand, rapid elasticity, pooled resources, and metering).

3 How is Database-as-a-Service different from On-Premises Database Management Systems?

On premises database management systems are typically installed in virtual machines or on bare metal servers. Those implementations often leverage Infrastructure as Code (IoC) approaches from Ansible, Terraform, Chef, or Puppet for the initial software installation and configuration. While that accelerates the initial installation, IaC doesn’t monitor the hardware and software, and it doesn’t automatically address failures - a key requirement for any DBaaS that wants to offload operational responsibilities from the user. Resource pools are also usually much more constrained, which results in limited elasticity.

IaC-based approaches do not meet all the business drivers for the move to a DBaaS: Agility, Innovation, Data Center Closures, Global Markets, and Focus on Value-Add Activities.

Key drivers for the enterprise move to the cloud

Key drivers for the enterprise move to the cloud

On-premises deployments are not truly agile. IaC will reduce deployment time frames somewhat, but the customer still needs to provide compute, storage, and networking first, which limits self-service and elasticity. They also do not support consumption based licenses in most cases.

Similar issues limit the innovation aspects. Cloud DBaaS offer a wide array of services which allows for experimentation and use of specialty technology for single projects. This is essential for trying new ideas and supporting a ‘Fail fast’ type of approach. On prem deployments usually require a bigger commitment to technology in terms of licenses, hardware, and operational skills, which makes it much harder to quickly try something new.

Globalization is another driver for the move to the cloud. Data needs to be stored in close proximity to the user for performance and compliance reasons. On premises models for these needs are prohibitively costly and are too complex to meet globalization demands.

Well-designed on-premises deployments that use reference architectures and IaC technology can automate many tasks, but in-house resources still need to monitor, manage, rack and stack, patch the software, etc. More and more enterprises want their resources focused on innovation and business-oriented value add, such as data stewardship and analytics, instead of behind the scenes grunt work that keep the lights on.

4 Is DBaaS considered to be SaaS, PaaS, or IaaS?

In this section we will compare DBaaS to Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS).

NIST SP 800-145 defines the three models as follows:

Software as a Service (SaaS). The capability provided to the consumer is to use the provider’s applications running on a cloud infrastructure. The applications are accessible from various client devices through either a thin client interface, such as a web browser (e.g., web-based email), or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user- specific application configuration settings.

Platform as a Service (PaaS). The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment.

Infrastructure as a Service (IaaS). The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components (e.g., host firewalls).

Databases, especially when one considers the power of the SQL language, can be considered a type of Platform as a Service. Some aspects could be considered SaaS, such asGUI-based access but for the most part a DBaaS falls squarely into the PaaS category.

5 Comparing Virtual Machines, Kubernetes, and DBaaS

There are three ways to take a database to the public cloud: (1) use virtual machines, potentially in combination with IaC solutions such as Ansible; (2) use containers that are orchestrated with Kubernetes; or (3) use a full blown DBaaS, such as EDB’s BigAnimal or AWS’ RDS.

Three options to take a database to the cloud: Virtual machines, containers, or DBaaS

Three options to take a database to the cloud: Virtual machines, containers, or DBaaS

A recent survey on LinkedIn showed that DBaaS is the more popular option for moving to the cloud, but VMs on IaaS and Containers with K8s also have significant success in the market:

Q: How are you moving your databases to the cloud?

Answer % of reponses
Virtual Machines on IaaS 10%
K8s and Containers 35%
Database as a Service 55%

While DBaaS is the most popular, we want to be sure of what degree each one of these meets the ‘as a Service’ requirements outlined above. As a breakdown:

  VMs on IaaS Containers with K8s DBaaS
On demand No (Infrastructure as Code will help) Improved (Level III+ K8s operator automates failover, upgrades, backup/recovery) Yes
Broad network access Yes Yes Yes
Resource pooling No Yes Yes
Rapid elasticity No (Infrastructure as Code will help) Improved (Level V K8s operator will address database elasticity) Yes
Measured service No (IaaS resource consumption only) Improved (Level IV K8s operator will address database metering) Yes

Only DBaaS actually meets the requirements of a cloud service. While K8s represents a definite improvement over VMs in terms of the key ‘as a Service’ requirements, it still lacks the central control plane that is key to implementing the shared responsibility model.

However, there are other trade offs to consider:

  Advantages Things to consider
VMs on IaaS
  • Full control, including when to apply upgrades
  • Ultimate configurability and tuning at the database and OS level
  • Any extensions and versions
  • Highest performance and lowest 3rd party cost
  • Need infrastructure DBAs
  • Need 24 X 7 monitoring
  • You are responsible for configuration and day to day operation
  • Minimal automation
K8s on Containers
  • K8s operator automates many lifecycle tasks (installation, failover, backup/recovery, upgrade, etc.)
  • The user decides when to apply software upgrades
  • Works best when application is containerized and K8s is used for the whole stack
  • Configuration options are limited
  • Need K8s expertise and you have to buy into the K8s way of doing things
  • Need 24 X 7 monitoring
Managed Database Service
  • SRE team monitors and operates
  • SRE applies upgrades
  • You focus on innovation, they keep the lights on
  • Relinquish significant control
  • No OS tuning
  • Limited database extensions
  • The user has limited control over upgrades

6 What are the advantages and disadvantages of DBaaS?

In a DBaaS, the CDO assumes key responsibilities: manage the software, the hardware, and the network, while assuming responsibility for the service availability. That represents the biggest advantage - the DBaaS customer doesn’t have to worry about those tasks - and the biggest drawback - the customer is completely dependent on the CDO to discharge these tasks well and in a timely manner that does not interrupt the database service, which is not always the case. At the time of this writing, one of the leading DBaaS for Postgres was lagging behind on providing the latest version of open source Postgres by over two years.

Another challenge arises from the segregation of duties. As the CDO is responsible for maintenance of the operating system and the installation of the database software, the user will not be granted operating level access, which means that certain configuration parameters, logs, and software add-ons will not be accessible for tuning and diagnostics. 

7 What are the DBaaS service variations?

There are two major DBaaS variations: those that run the database in the customer’s account, and those that run in the CDO’s account. AWS RDS, Google Cloud SQL, and EDB’s BigAnimal run in the customer’s account. Mongo Atlas runs in the CDO’s account. As of this writing, Couchbase Capella and ScyllaDB provide both options.

A DBaaS running in the customer’s account, such as  their AWS account, allows the customer to track the DBaaS resource usage against their spend commitment with the CDO. This is important at the enterprise level, as many customers negotiate major volume purchases with the large cloud providers.

Many companies have also found it easier to meet compliance requirements when the database and the data are stored in their cloud accounts.

8 What is PostgreSQL as a Service? 

Postgres as a Service is a DBaaS that allows users to create, use and manage Postgres databases in the cloud. Postgres DBaaS are available on every cloud: RDS Postgres on AWS, Azure Database for Postgres on the Azure cloud; Google Cloud Postgres on Google Cloud Platform are examples of Postgres DBaaS operated by the cloud service provider (CSP). 

Specialty vendors such as EDB or Crunchy Data operate Postgres DBaaS on public clouds. These vendors excel by providing in-depth Postgres expertise as part of their DBaaS delivery.

unleash the power of postgres in the cloud

EDB’s BigAnimal is a Postgres DBaaS that combines EDB’s Postgres expertise, with the native Oracle compatibility of EDB’s Postgres Advanced Server, and the continuous availability provided by EDB’s Cloud Native Postgres Operator. At the time of this writing, BigAnimal is available on Azure, and is expected to be deployed on AWS and GCP before the end of 2022.

9 Common risks when moving to the cloud

This section discusses risks customers may unwittingly encounter when moving their database to a database as a service (DBaaS) in the cloud, especially when the DBaaS leverages open source database software such as Apache Cassandra, MariaDB, MySQL, Postgres, or Redis.

We classify these risks into several categories: support, service, technology stagnation, cost, and lock-in. Moving to the cloud without sufficient diligence and risk mitigation can lead to significant cost overruns and project delays, and more importantly, may mean that enterprises do not get the expected business benefits from cloud migration.

9.1 Support risk

Customers running software for production applications need support, whether they run in the cloud or on premises. Support for enterprise-level software must cover two aspects: expert advice on how to use the product correctly, especially in challenging circumstances, and quickly addressing bugs and defects that impact production or the move to production. 

For commercial software, a minimal level of support is bundled with the license. Due to the permissive nature of many open source licenses, there are many opportunities for cloud database providers to not invest sufficiently in the open source community to address bugs and provide support for open source databases when creating and operating a DBaaS.

Customers can evaluate a cloud database provider’s ability to support their cloud migration by checking the open source software release notes and identifying team members who actively participate in the project. For example, for PostgreSQL, the release notes are freely available, and they name every individual who has contributed new features or bug fixes. Other open source communities follow similar practices. 

Open source cloud database providers that are not actively involved in the development and bug fixing process cannot provide both aspects of support—advice and rapid response to problems—which presents a significant risk to cloud migration.

9.2 Service risk

Databases are complex software products. Many users need expert advice and hands-on assistance to configure databases correctly to achieve optimal performance and high availability, especially when moving from familiar on-premises deployments to the cloud. Cloud database providers that do not offer consultative and expert professional services to facilitate this move introduce risk into the process. Such providers ask the customer to assume the responsibilities of a general contractor and to coordinate between the DBaaS provider and potential professional services providers. Instead of a single entity they can consult to help them achieve a seamless deployment with the required performance and availability levels, they get caught in the middle, having to coordinate and mitigate issues between vendors.

Customers can reduce this risk by making sure they clearly understand who is responsible for the overall success of their deployment, and that this entity is indeed in a position to execute the entire project successfully. EDB’s DBA Services have been very successful at helping customers with an integrated end-to-end solution with very high availability.

9.3 Technology stagnation risk

The shared responsibility model is a key component of a DBaaS. While the user handles schema definition and query tuning, the cloud database provider applies minor version updates and major version upgrades. Not all providers are committed to upgrading in a timely manner—and some can lag behind significantly. At the time of this writing, one of the major Postgres DBaaS providers lagged behind the open source community by almost three years in their deployment of Postgres versions. While DBaaS providers can selectively backport security fixes, a delayed application of new releases can put customers in a situation where they miss out on new database capabilities, sometimes for years. Customers need to inspect a provider’s historical track record of applying upgrades to assess this exposure.

A similar risk is introduced when a proprietary cloud database provider tries to create their own fork or version of well-known open source software. Sometimes this is done to optimize the software for the cloud environment or address license restrictions. Forked versions can deviate significantly from the better-known parent or fall behind the open source version. Well-known examples of such forks or proprietary versions are Aurora Postgres (a Postgres derivative), Amazon DocumentDB (with MongoDB compatibility), and Amazon OpenSearch Service (originally derived from Elasticsearch).

Users need to be careful when adopting cloud-specific versions or forks of open source software. Capabilities can deviate over time, and the cloud database provider may or may not adopt the new capabilities of the open source version.

9.4 Cost risk

Leading cloud database services have not experienced meaningful direct price increases. However, there is a growing understanding that the nature of cloud services can drive significant cost risk, especially in the case of self-service and rapid elasticity combined with an intransparent cost model. In on-premises environments, database administrators (DBAs) and developers must optimize code to achieve performance with the available hardware. In the cloud, it can be much more expedient to ask the cloud provider to increase provisioned input/output operations per second (IOPS), compute, or memory to optimize performance. As each increase instance drives up cost, such a short-term fix is likely to have long-lasting negative cost impacts.  

Users mitigate the cost risk in two ways: (1) close supervision of the increases of IOPS, CPU, and memory to make sure they are balanced against the cost of application optimization; (2) scrutiny of the cost models of DBaaS providers to identify and avoid vendors with complex and unpredictable cost models.

9.5 Lock-in risk

Cloud database services can create a “Hotel California” effect, where data cannot easily leave the cloud again, in several ways. While data egress cost is often mentioned, general data gravity and the integration with other cloud-specific tools for data management and analysis are more impactful. Data gravity is a complex concept that, at a high level, purports that once a business data set is available on a cloud platform, more applications likely will be deployed using the data on that platform, which in turn makes it less likely that the data can be moved elsewhere without significant business impact.

Cloud-specific tools are also a meaningful driver for lock-in. All cloud platforms provide convenient and proprietary data management and analysis tools. While they help derive business value quickly, they also create lock-in.

Users can mitigate the cloud lock-in effect by carefully avoiding the use of proprietary cloud tools and by making sure they only use DBaaS solutions that support efficient data replication to other clouds.

9.6 Planning for risk

Moving databases to the cloud is undoubtedly a target for many organizations, but doing so is not risk-free. Businesses need to fully investigate and understand potential weaknesses of cloud database providers in the areas of support, services, technology stagnation, cost, and lock-in. While these risks are not a reason to shy away from the cloud, it’s important to address them up front, and to understand and mitigate them as part of a carefully considered cloud migration strategy.

EDB designed BigAnimal specifically to address the concerns listed above. BigAnimal is supported by the #1 contributor to the Postgres project who also has a strong services team, and EDB is committed to deploying the latest Postgres versions using a highly transparent cost model.

Interested in learning more about common challenges encountered when deciding to go with PostgreSQL in a cloud environment? View EDB's blog post for more information. 

10 DBaaS Tools

In this section, we will consider development tools, APIs and CLIs, CI/CD tools, and Infrastructure as Code environments.

10.1 Development Tools

Postgres ships with a powerful command line client, called psql, which is used by many developers. The psql client can be used in interactive mode and in batch mode to run more complex scripted operations. It tends to be the preferred choice for experienced PostgreSQL DBAs. psql works well with Postgres in the cloud.

pgAdmin’s ERD diagramming Tool

pgAdmin’s ERD diagramming Tool

The most popular GUI-based development environment for PostgreSQL is pgAdmin. It can be run as a standalone tool, for one developer only, or it can be configured in a server mode, where teams can collaborate on multiple database servers. The pgAdmin environment supports browsing of database objects, configurations, query development, ERD diagramming, and schema diff to identify patches that need to be rolled into production. pgAdmin connects to all leading Postgres DBaaS.

For developers coming from Oracle who want to stay with a familiar tool, Toad Edge is also an option. 

While there are other development tools that support Postgres, pgAdmin appears to be the clear favorite of the Postgres development community.

For a more complete overview of tools that are integrated with Postgres, consult The Expert’s Guide to Integrating PostgreSQL.

10.2 Database compatibility

Not every move to the cloud is a homogeneous lift-and-shift. Often the database platform changes. In EDB’s experience, Oracle is the most frequent source database when moving to Postgres in the cloud. Understanding how to migrate the schema, the data, the stored procedures, and the APIs to a DBaaS is absolutely key. EDB Postgres Advanced Server was developed to support the migration from Oracle and is one of the Postgres distributions available on BigAnimal.

For a step-by-step description of the migration process and the Oracle migration tools, please consult this whitepaper.

10.3 APIs and CLIs

For DBaaS deployments at scale, especially in DevOps or CI/CD environments where application changes and database changes need to be deployed in sync, management of the databases via an application programming interface (API) or a command line interface (CLI) is key. Deploying changes by hand is not only too slow and not scalable, it is also extremely error prone.

Other clouds, such as AWS RDS, also provide APIs and CLIs. Understanding how to correctly use them, and how to program against them, for example from Python or Go, is key to the successful use of DBaaS at scale.

The following description is taken from a series of blogs by Doug Ortiz about the use of API and CLI to manage Postgres databases on BigAnimal.

10.3.1 BigAnimal API Cluster Management

BigAnimal APIs make it possible to manage various components and resources such as Billing, Cloud Providers, Clusters, Events, Organizations, Permissions, Postgres Types, Postgres Versions, Provider, Roles, Status, Users, and Authentication. 

A BigAnimal cluster consists of databases, users, roles, and tablespaces that will be managed by a group of nodes of a predefined architecture with the goal of running Postgres/EDB Postgres Advanced Server. The API task that we are interested in the most for managing clusters is called CLUSTERS. This API allows us to create, list, and delete a cluster.

10.3.1.1 Authentication

Before we get started with cluster management in BigAnimal we must provide our credentials and authenticate against BigAnimal. We will focus on utilizing get-token.sh available in the BigAnimal Cloud Utilities GitHub repository.

This method will consist of the following steps:

  1. Download and execute get-token.sh
  2. Authenticate with the url provided by the get-token.sh script
    1. Navigate to the indicated url in a browser
    2. Click the Confirm Button for the indicated device code
    3. Confirm the successful login
    4. Retrieve the provided token after the successful login
  3. Assign the authentication token to a variable for use across the command line
    TOKEN='<token>'
10.3.1.2 Creating Clusters

The creation of a cluster requires multiple parameters to be sent via the command line. To prevent writing all those parameters, a data.json file can be created with the following content in order to send all the parameters to the clusters API.

{
  "clusterName": "do-cluster-1",
  "instanceType": {
    "cpu": 2,
    "id": "azure:Standard_E2s_v3",
    "instanceType": "E2s v3",
    "ram": 16
  },
  "password": "SuperSecretPassword",
  "pgConfigMap": [
    [
      "max_connections",
      "100"
    ]
  ],
  "postgresType": {
    "dockerImage": "",
    "id": "epas"
  },
  "postgresVersion": "13",
  "privateNetworking": true,
  "provider": {
    "cloudProviderId": "azure"
  },
  "region": {
    "regionId": "westus2"
  },
  "replicas": 3,
  "volumeProperties": "P1",
  "volumeType": {
    "configFieldsList": [
      "secretName",
      "shareName"
    ],
    "displayName": "Azure Premium Storage",
    "id": "azurepremiumstorage",
    "storageClass": "managed-premium"
  },
  "zoneRedundantHa": true
}

Once data.json is created, we can refer to it in the command line. 

The command line statement below will create a cluster.

curl -X 'POST' 'https://portal.biganimal.com/api/v1/clusters' -H "Authorization: Bearer ${TOKEN}" -H 'accept: application/json' -H 'Content-Type: application/json' -d '@data.json'

A successful cluster creation will return a json array with cluster details.

[{"name":"do-cluster-1","instance":3,"currentPrimary":"p-c6b6543ehcerrn8s6rg-1", "ehcerrn8s6rg-r","phase":"Creating a new replica","phaseReason","Creating replica",0dc168873bea9bddacc45e63fdea17bdc28a7c8b450","postgresConfig":"","maintenanceWi
10.3.1.3 Listing Clusters

Listing clusters can be achieved by providing multiple parameters that were defined in the data.json utilized for creating the cluster. These parameters can be used to filter the results. A few example parameters could be name, provider, pgtype, and or pgVersion. 

Below is the command utilized to list clusters by name, provider, pgtype, and pgVersion with the parameter used to sort by.

curl -X 'GET' 'https://portal.biganimal.com/api/v1/clusters?name=DO&provider=azure&pgType=epas&pgVersion=13&sort=%2Bname' -H 'accept: application/json'  -H "Authorization: Bearer ${TOKEN}"
10.3.1.4 Deleting Clusters

Deleting clusters is accomplished by providing the pgID, which can be retrieved by listing the cluster details and seeking the ‘pgId’ value.

curl -X 'DELETE' 'https://portal.biganimal.com/api/v1/clusters/<pgId>' -H 'accept: *.*'  -H "Authorization: Bearer ${TOKEN}"

10.4 Continuous Integration/Continuous Delivery

Continuous Integration and Continuous Delivery (CICD) are key to agile development. The leading tools for CI/CD with Postgres are Flyway and Liquibase. These tools are used to manage schema and reference data changes as well as promotions of those changes from development to production via any intermediary stages in the release pipeline. They can also handle dry-runs and roll backs.

The following example shows the use of Liquibase with EDB’s BigAnimal DBaaS.

10.4.1 Creating a Project in Liquibase Hub

For the purposes of this example, a demo Liquibase project will be created that will then be used to attach all data changes with the registerChangeLog subcommand. The first step is the creation of the Liquibase project.

A Liquibase project is created by performing the steps below:

  • Navigate, and Login to Liquibase Hub - a registered account should be created
  • Click on Projects Link
  • Click on Create Project Link
  • Type in Project
    • Name
    • Description
  • Click Create Project Button

 10.4.2 Downloading your Liquibase jar file

Navigate to the Liquibase download page, review the system requirements, determine the desired method of installation, and download the file that matches your environment and specific use case.

10.4.3 Creating your liquibase.properties file

A liquibase.properties file contains the information required in order to authenticate and connect via the JDBC driver. Prior to proceeding to fill in the details for this file and commencing this step, it is recommended to have the values readily available.

The classpath location should indicate the path for its location if the Liquibase jar file is not located alongside the liquibase.properties file.

changeLogFile: dbchangelog.xml
url: jdbc:postgresql://<db_url>:<port>/<database>
username: <username>
password: <password>
classpath: postgresql-42.3.0.jar
liquibaseProLicenseKey:
<licensekey>

10.4.4 Generating Changelog

The subcommand to generate the change log file is:

liquibase generateChangeLog

The generateChangeLog subcommand creates a script file containing all of the objects within the database represented as a changeset (unit of change that can be grouped and tagged) at the moment of generateChangeLog execution. The content of this file can help perform a full rollback should the database suffer a catastrophic failure.

A successful generateChangeLog will create a file with the name indicated in the changeLogFile parameter of ‘liquibase.properties’, and will display the following output in the command line:

Liquibase command ‘generateChangeLog’ was executed successfully.

10.4.5 Creating the changeset file

Create a template file containing all the data changes to be pushed towards the database. Keep in mind that Liquibase Pro Edition allows for rollbacks per changeset, which is the format provided as an example. Each data change should follow the convention below:

<changeSet author="<user> (generated)" id="<id>-<numberofchange>">
   <insert catalogName="<databasename>"
    dbms="postgresql"
    schemaName="public"
    tableName="departments">
      <column name="deptno" value="40"/>
      <column name="dname" value="Sales"/>
      <column name="location" value="USA"/>
   </insert>
   <rollback>
   DELETE FROM public.departments WHERE deptno=40;
   </rollback>   
</changeSet>

ChangeLog files are supported in multiple formats: SQL, XML, JSON, YAML, and other formats.

10.4.6 Update dbchangelog.xml file

Before proceeding to execute the registerChangeLog subcommand, the steps below must be completed to update dbchangelog.xml:

  • Create a copy of the generated dbchangelog.xml file
  • Remove all the of the ‘<changeSet>’ tags except the first one
  • Copy all of the data changes to be applied after the first ‘<changeSet>’ tag
  • Copy ID from generated ID in generated dbchangelog.xml
  • Apply ID along with dash and number of data change: 
<id>-<numberofchange>
  • Remove the first ‘<changeSet>’ tag

10.4.7 Registering the Changelog

The Liquibase subcommand to register the changeLog is:

liquibase registerChangeLog

A prompt will appear listing the Liquibase projects available as choices from which you should:

  • Select the desired project to be attached

A successful message will be shown in the resulting terminal output:

Liquibase command 'registerChangeLog' was executed successfully.

10.4.8 Pushing changes to the DB

At this point, we are ready to push our data changes to the database with the liquibase subcommand:

liquibase update

A successful update will look like this:

Liquibase command 'update' was executed successfully.

10.4.9 Rollback changes to the DB

Rolling back specific changesets is only available in the Liquibase Pro version. The changeset number is what becomes useful when targeting a specific changeset to rollback to as illustrated next:

liquibase rollbackCount <count>

A successful rollbackCount message will be displayed:

Liquibase command 'rollbackCount' was executed successfully.

10.5 Infrastructure as Code

Infrastructure as code (IaC) is the process of managing and provisioning computer data centers through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools (www.wikipedia.com). In a DBaaS environment, the CDO provides compute, storage, and networking, but this infrastructure still has to be tied together and configured to create a complete data management solution. IaC solutions such as Ansible or TerraForm are available for most DBaaS environments. For example, AWS Terraform modules are available at Terraform.io, while Ansible collections for RDS are available on the Ansible Galaxy website.

11 Security and compliance

11.1 Securing Postgres in the Cloud

Securing Postgres in the cloud, especially in the context of a DBaaS, is significantly simpler than securing Postgres on premises or in an IaaS environment as physical access control, on-disk encryption, network access, and integration with identity management systems are handled by the CDO. For a complete discussion of Postgres security considerations for on-prem or IaaS deployment, see Dave Page’s excellent blog How to Secure PostgreSQL: Security Hardening Best Practices & Tips.

As with any security configuration, follow the principle of least privilege when considering how to configure your system; that is, only allow as much access as is required to implement a working system, and no more.

11.1.1 Connecting

There are two ways to connect to a Postgres server; via a Unix Domain Socket or a TCP/IP Socket. In a DBaaS environment, only TCP/IP connections are used.

In a DBaaS environment, the CDO does not provide access to the Postgres host-based access configuration file hba.conf. Instead, the CDO will provide IP address filtering methods to limit the IP network access from a public network and private networking methods, such as VPCs, to isolate the database completely from any kind of public Internet access.

Most cloud providers recommend firewalls built into the platform. This typically makes management much easier, allowing rule sets to be created that can be reused and attached to multiple servers, and allowing management through their web and command line interfaces and REST APIs.

Most cloud providers also offer Virtual Private Clouds (VPC), in which a number of servers can coexist in a single virtual environment with their own private network or networks. This type of configuration has typically become the default and makes it very easy to deploy a multi-tiered system on the public cloud, whilst keeping the non-public tiers segregated from the internet in general. The use of multiple subnets within a VPC can make it easy to further segregate servers, keeping the public tiers in a "DMZ" subnet, with only minimal access to the database servers that are in a private subnet with no direct internet connection.

CDOs offer a range of approaches to create secure and encrypted connections, see for example BigAnimal’s description for securely connecting to your database cluster.

11.1.2 Transport Encryption

If traffic to the database server is flowing across the network, it is essential practice to encrypt that traffic. Postgres uses OpenSSL to provide transport security through the use of TLS (previously SSL).

While on premises deployments require Postgres configuration to encrypt the connection, DBaaS providers significantly simplify this task too by providing predefined Certification Authorities (CA) and simple step-by step guidance to securely connect to the database cluster.

11.1.3 Authentication

During on premises or in IaaS deployments of Postgres, the pg_hba.conf file is used to select authentication mechanisms (trust, peer, md5, scram, scram-sha-256, LDAP. Kerberos, TLS) and configure parameters, such as the authentication timeout period. In a DBaaS this complexity is not required, and the CDO provides integration with the cloud’s identity and access methods. For example, BigAnimal leverages Azure’s ActiveDirectory for single sign-on.

11.1.4 Roles

Very old—practically prehistoric—versions of PostgreSQL offered users and user groups as ways of grouping user accounts together. In PostgreSQL 8.1 this system was replaced with the SQL Standard compliant roles system.

A role can be a member of other roles, or have roles that are members of it. This is sometimes referred to as "granting" a role to another role. Roles have a number of attributes that can be set, including ones that effectively make them user accounts that can be used to login to the database server. An example of granting a role to another role is shown below:

GRANT pg_monitor TO nagios;

This makes the nagios role a member of pg_monitor, thereby giving nagios access to the extended functionality reserved for superusers and members of the pg_monitor role.

11.1.4.1 Role attributes

Roles have a number of fixed attributes that can be set:

  • LOGIN - can this role be used to login to the database server?
  • SUPERUSER - is this role a superuser?
  • CREATEDB - can this role create databases?
  • CREATEROLE - can this role create new roles?
  • REPLICATION - can this role initiate streaming replication?
  • PASSWORD - does this role require a password?
  • BYPASSRLS - can this role bypass Row Level Security checks?
  • VALID UNTIL - an optional timestamp after which time the password will no longer be valid.

 Roles with the SUPERUSER flag set automatically bypass all permission checks except the right to login¹.

There are a number of other less commonly used role attributes that can also be set. See the Postgres documentation for more information.

11.1.4.2 Password complexity

PostgreSQL (as opposed to EDB Postgres Advanced Server) doesn't include any password complexity enforcement functionality by default. It does include a hook that can be used to plugin a module to do password complexity checks, however this will have no effect if the user changes their password using a pre-hashed string

11.1.4.3 Password profiles

EDB Postgres Advanced Server offers a password profile feature that can be used with the password (never use this, as the password will be transferred in plain text!), md5, and scram-sha-256 authentication methods configured in pg_hba.conf. Password profiles can be configured by the superuser and applied to one or more roles. A profile allows you to define the following options:

  • FAILED_LOGIN_ATTEMPTS: The number of failed login attempts that may occur before the role is locked out for the amount of time specified in the PASSWORD_LOCK_TIME parameter.
  • PASSWORD_LIFE_TIME: The number of days a password can be used before the user is prompted to change it.
  • PASSWORD_GRACE_TIME: The length of the grace period after a password expires until the user is forced to change their password. When the grace period expires, a user will be allowed to connect, but will not be allowed to execute any command until they update their expired password.
  • PASSWORD_REUSE_TIME: The number of days a user must wait before re-using a password.
  • PASSWORD_REUSE_MAX: The number of password changes that must occur before a password can be reused.
  • PASSWORD_VERIFY_FUNCTION: The name of a PL/SQL function that can check password complexity.

Note that if the user changes their password by supplying a new one in a pre-hashed form, then it is not possible to verify the complexity with the PASSWORD_VERIFY_FUNCTION option or re-use with the PASSWORD_REUSE_MAX option. In order to mitigate this, the PASSWORD_ALLOW_HASHED option may be set to false in the password profile.

If you're running EDB Postgres Advanced Server, consider using password profiles to ensure your users maintain strong, regularly changed passwords.

11.1.4.4 SET ROLE

The SET ROLE SQL command may be used by a user to change the user identifier of the current session to the name of any role of which they are a member. This may be used to either add to or restrict privileges on the session, and may be reset using RESET ROLE (thus making SET ROLE unsuitable for use as a multi-tenancy solution).

SET ROLE is similar to using the sudo su - <user> on a Unix-like system. It essentially allows you to run SQL commands as that other user.

By default when a role is a member of another role, it will automatically inherit the privileges of that role. In order to use SET ROLE effectively, the NOINHERIT keyword should be used when creating the role to prevent inheriting privileges automatically, requiring the use of SET ROLE to explicitly gain them when needed.

In addition to SET ROLE, there is also a SET SESSION AUTHORIZATION command which is only available to superusers². The high-level difference between them is that SET ROLE will change the current_user value but not session_user, whilst SET SESSION AUTHORIZATION will change both. In practical terms, this means that after running SET SESSION AUTHORIZATION, any subsequent SET ROLE commands will be restricted to those that the session_user could perform, regardless of the fact that the original session_user was a superuser. This allows superusers to more accurately imitate another user.

Consider using SET ROLE to allow users to temporarily elevate their privileges when and only when required to perform more potentially dangerous tasks.

11.1.4.5 Monitoring Roles

Postgres comes with a number of built-in monitoring roles which have access to functionality that was originally restricted to superusers in earlier versions of Postgres. These roles allow you to grant specific privileges to roles that are used to monitor the system, without having to give them full superuser access:

  • pg_monitor: A role which is a member of the following roles:
    • pg_read_all_settings: Read all configuration variables, even those normally visible only to superusers.
    • pg_read_all_stats: Read all pg_stat_* views and use various statistics related extensions, even those normally visible only to superusers.
    • pg_stat_scan_tables: Execute monitoring functions that may take ACCESS SHARE locks on tables, potentially for a long time.

Use the pg_monitor role to give elevated privileges to the roles that you use to monitor your database servers to avoid the need to give them superuser access. Ensure all implemented roles have the minimum privileges required to do what you need.

11.1.5 Data Access Control

Access Control Lists or ACLs are somewhat cryptic strings that are attached to objects such as tables, functions, views, and even columns in Postgres. They actually contain a list of privileges such as select, insert, execute, and so on that are granted to each role, as well as an additional optional flag (*) for each privilege which, if present, denotes that the role has the ability to grant this privilege to other roles. Finally, ACLs also include the name of the role that granted the privileges.

An example of an ACL for a table created by Joe might be as follows:

joe=arwdDxt/joe =r/joe sales_team=arw/joe

 The second section tells us that read access has been granted to PUBLIC (a special, pseudo-role that means everyone) by Joe, and the third section tells us that the Sales Team has been granted INSERT, SELECT and UPDATE privileges, again, by Joe.

The privilege flags in ACLs vary quite significantly based on the type of object in question; please review the documentation for further details.

It's useful to understand how ACLs are written in Postgres, particularly if you prefer working with command line tools which will typically show them in the internal format. Graphical tools such as pgAdmin will parse and display the ACL in a visual format that is much easier to read.

Any well designed system should use roles in conjunction with ACLs to protect the schema and data in the database. It is good practice to have the schema (i.e. the tables and other objects) be owned by a non-superuser role which is not a role that the application uses to connect to the database with or used to grant other privileges to login roles. Create group roles that reflect the permissions or roles within your application that have the required database privileges, and grant those roles to login roles as required. It is not usually a good idea to grant privileges directly to login roles used by end users, as that can quickly become difficult to manage.

Spend time fully understanding the privileges required in your system for users and applications to be able to do their jobs. Minimize privileges to only those required, separate schema ownership from data, and make use of group roles to simplify privilege management for individual login roles.

11.1.5.1 GRANT & REVOKE

ACLs are managed on objects in Postgres through the use of the GRANT and REVOKE SQL commands. In most cases when an object is created, only the owner has any privileges to use or work with that object in any way. There are some exceptions, such as that PUBLIC is grants EXECUTE permission on functions and procedures, CONNECT and TEMPORARY permission on databases, and USAGE permission on languages, data types and domains. Any of these privileges can be revoked if required.

Permission to modify or drop an object is always reserved for the owner of the object and superusers. The object ownership can be reassigned using the ALTER SQL command.

Default privileges can be overridden using the ALTER DEFAULT PRIVILEGES command for some object types. This allows you to configure the system such that certain privileges are automatically granted to roles on new objects that are created. For example, Joe in the previous example could issue a command such as the one below to grant the Sales Team insert, select, and update privileges on any new tables (but not pre-existing ones, which may need to be updated manually):

ALTER DEFAULT PRIVILEGES 
    GRANT INSERT, SELECT, UPDATE 
    ON TABLES 
    TO sales_team;

 Assuming that when a new object is created it doesn't automatically include the required privileges in the ACL, we can use GRANT and REVOKE to set up the ACL as required. To continue our previous example, Joe might use the following SQL command to grant the Sales Team permissions on the orders table:

GRANT INSERT, SELECT, UPDATE ON orders TO sales_team;

In order to revoke any automatically granted privileges, or to revoke previously granted privileges to meet changing business needs, we can use the REVOKE SQL command, e.g.

REVOKE UPDATE ON orders FROM sales_team;

 Assuming the Sales Team previously had the INSERT, SELECT, and UPDATE privileges as seen in the earlier example, this would remove the UPDATE privilege, allowing them to view and add orders, but not modify them.

It is worth noting that the use of ACLs on columns can sometimes lock users out because the wildcard in a SELECT * FROM query will not exclude the columns that users don't have access to and will instead return an access denied message for the table. In such cases the user should explicitly list the columns they have permission to SELECT from.

Having created group roles in which to organize login users, use the GRANT and REVOKE SQL commands to give the group roles the minimum level of privilege required to work. Use default privileges where appropriate as a time-saver, but be careful that doing so doesn't give more privileges than appropriate in the future. Use GRANT to give the privileges to the required login roles by making them members of the group roles.

11.1.5.2 RLS

Row Level Security or RLS is a technology available in Postgres that allows you to define policies that limit the visibility of rows in a table to certain roles. Before we dive into the details of how an RLS policy can be set up, there are two important caveats to note:

  1. Superusers and roles with the BYPASSRLS attribute always bypass row level security policies, as do table owners unless they force the policy on themselves.
  2. The existence of a row may be inferred by a user through "covert channels". For example, a unique constraint on a field such as a social security number might prevent the user inserting another row with the same value. The user cannot access the row, but they can infer that a record with that social security number already exists.

By default, row level security is turned off on tables in Postgres. It can be enabled with a command such as ALTER TABLE...ENABLE ROW LEVEL SECURITY, which will enable a restrictive policy preventing access to all data unless or until other policies have been created.

The policy itself consists of a name, the table to which the policy applies, the optional role to which it applies, and the USING clause which defines how matching or allowed rows will be identified. For example, we might limit access to orders to the Sales Team member that created them:

CREATE POLICY sales_team_policy ON orders TO sales_team
    USING (sales_person = current_user);

 We can also specify operations to which the policy applies. The following example would allow all members of the Sales Team to select any orders, but only the original sales person would be able to update or delete an order:

CREATE POLICY sales_team_select_policy ON users
    FOR SELECT
    USING (true);
CREATE POLICY sales_team_modify_policy ON users
    USING (sales_person = current_user);

By default, permissive policies are used meaning that where there are multiple policies that apply, they are combined using a boolean OR. It is also possible to use restrictive policies, where a boolean AND is used when evaluating whether or not access to a row satisfies the combined policies.

Row Level Security policies can take some effort to set up and index design must also consider them, but there are cases when it may be essential to do so, such as in a medical records system where it may be a legal requirement to restrict access to patient records to the medical staff that are directly responsible for the patient's care.

Consider the legal and ethical requirements to restrict access to specific rows in each of your tables, and design and implement RLS policies to meet those requirements where necessary. Take care to minimize covert channels by avoiding the use of sensitive data in constraints.

11.1.5.3 Views

Views are obviously useful for encapsulating commonly executed queries into an object that can be queried as if it were also a table, however they can also be useful for preventing unauthorized access to data by ensuring that roles do not have the ability to select from the underlying tables, and have to access the data from the view instead. A classic example is part of Postgres; the pg_catalog.pg_authid table contains a row for each role in the database, including a column containing the hash of the password for the role if it's been set. Because the hash is considered sensitive information, the table does not have SELECT privileges for any roles other than the superuser that the database was initialized with.

A view (pg_catalog.pg_roles)  is provided instead, which can be selected from by any user. When selecting from the view, the password is always returned as ********. This is arguably more convenient than simply using an ACL on the password column in the underlying table as that would cause a permissions error if queried with SELECT * FROM.

When using updateable views, a CHECK OPTION is available when defining the view. When omitted, the view will allow the user to insert or update records such that they wouldn't be visible through the view, otherwise the insert or update will only be allowed if the row would be visible to the user. If LOCAL CHECK OPTION is specified, row visibility is checked only against conditions on the view being used directly, but when CASCADED CHECK OPTION is used (the default, if CHECK OPTION is specified), row visibility is checked against the view being used directly as well as any other underlying views.

Consider using views over secured tables as a method of allowing access to a limited subset of the columns in the underlying table to appropriate roles.

11.1.5.4 Security barriers

Using views to restrict access to a column is quite common; however, people often also use them to restrict access to certain rows. Whilst there is certainly value in doing that, one must be mindful of one particularly nasty side effect; it's possible to trick the Postgres optimiser into leaking the hidden data! 

This is not actually a bug; it's the way the system is intended to work. Essentially what can happen is that when a query against a view is executed by the user, and the user includes a call to a very low cost function in that outer query, the optimiser may choose to run the query for every row in the data underlying the view before it applies the selectivity clauses in the view, thus allowing the function to access the restricted data. This is demonstrated nicely in a blog post by EDB’s Robert Haas.

To solve this problem, we use a security barrier, which is basically an option that is passed when the view is created that tells Postgres to always execute the qualifiers on the view first, thus ensuring that the function never sees the hidden rows.

Related to security barriers is the LEAKPROOF parameter for functions. This can only be used by superusers when creating a function, and serves to certify that the function doesn't leak any information besides the intended return value. This allows Postgres to better optimize queries where a function is used with a security barrier view, safe in the knowledge that the function won't leak any information.

Be careful when using views to hide rows to ensure that they are marked as security barriers to avoid leaking of data. Consider whether RLS might be a better solution for limiting access to specific rows.

11.1.5.5 Security Definer Functions

By default, functions and procedures in Postgres are what we call SECURITY INVOKER functions. That means that when they are called, they execute with the privileges of the calling role.

Passing the SECURITY DEFINER option when creating the function means that whenever the function is called, it will be executed with the privileges of the owner instead of the calling role. This is similar to the setuid bit in a Unix file ACL, which when set, will allow an executable to run with the permissions of its owner instead of the user that executed it.

This ability can be useful in various situations. One example might be a function that is called by a trigger on a table to write a record to an audit log, that all login and group roles are prevented from accessing in any way. It is important to carefully consider the consequences of using SECURITY DEFINER functions though - in particular, ensure that they are kept as simple as possible and perform only a single task without taking any parameters that may allow them to be used for other purposes for which they were not intended.

Consider using SECURITY DEFINER functions to provide specific functionality to roles that cannot perform those tasks directly themselves. Be careful to consider the possible ramifications and ways in which such functions could be misused, and ensure they are limited to performing only the intended tasks.

11.1.5.6 Data redaction

Data redaction is a technique that hides specific pieces of sensitive information from users by dynamically changing the value that is displayed. Whilst this can be done to some extent with views in PostgreSQL as described above, EDB Postgres Advanced Server includes native data redaction functionality.

Redaction is implemented in EPAS using data redaction policies on tables. In short, these policies specify one or more columns on a table to operate on, an expression that determines whether or not the policy should be applied, a function to perform the redaction, a scope, and any exceptions to the policy. See the documentation link above for an example showing how policies can be created.

When using EDB Postgres Advanced Server and working with sensitive data such as  credit card numbers, consider using data redaction policies to dynamically change the data displayed to a redacted form such as "XXXX XXXX XXXX XXXX 8397" to prevent users having access to sensitive data unnecessarily.

11.1.6 Encryption

In a DBaaS implementation, the CDO is responsible for on-disk encryption of the data and the backups as part of the shared responsibility model. The CDO also provides all the facilities to ensure encryption of data on the network.

Some CDOs, such as AWS RDS, also allow the use of additional encryption tools such as pgcrypto for columnar encryption.

11.1.6.1 pgcrypto

pgcrypto is a standard extension of PostgreSQL and EPAS that is used to provide SQL functions for encryption and hashing. Consider using the pgcrypto extension in your databases when you require the ability to hash or encrypt individual pieces of data to meet regulatory or similar requirements.

11.1.6.1.1 Hashing in pgcrypto

Hashing is a method of generating a cryptographically secure representation of a piece of data, typically of a fixed length (the size of which is dependent on the algorithm used). Importantly, it is non-reversible; that is, the original data cannot be extracted from the hash value — however, because the hashed value is unique to the original data, it can be used as a checksum to see if the data has been changed or to see if a user provided value matches the original value.

Hashing is most commonly used to store passwords and other sensitive information that may need to be verified, but not returned.

As an example, we can use pgcrypto to hash a password that the user will use in the future:

INSERT INTO users 
    (username, email, password)
VALUES
    ('pgsnake', 'dave.page@enterprisedb.com', crypt('new password', gen_salt('md5')));

To verify this password later we can SELECT the user record from the table;

SELECT
    *
FROM
    users
WHERE
   username = 'pgsnake' AND 
   password = crypt('entered password', password)

If a record is returned, the password was entered correctly — otherwise, it was incorrect.

It is important to note that when passwords are included in SQL commands as shown above, they may end up being written to log files on the database server. Network communications may also leak commands like these, if not protected with encryption.

Never store user passwords in plain text or obfuscated form in the database, and never use a reversible encrypted form unless the functionality of the application absolutely requires it (for example, if writing a password manager application). Use non-reversible hashing wherever possible for application passwords and other information that must be verified but not returned.

11.1.6.1.2 Encryption in pgcrypto

pgcrypto also provides functionality for encrypting data which is useful when storing information that needs to be retrieved but should be stored in a secure form. There are "raw" encryption/decryption functions provided with pgcrypto as well as PGP functions. The PGP functions are strongly encouraged over use of the raw functions which use a user-provided key directly as the cipher key, provide no integrity checking, expect the user to manage all encryption parameters, and work with bytea data - not text.

Symmetric key encryption is the easiest to use, as it doesn't require a PGP key. For example, we can demonstrate encryption and decryption of data as shown with this simple SQL command in which the inner function call encrypts the data and the outer one decrypts it:

SELECT pgp_sym_decrypt(
    pgp_sym_encrypt('Hi There', 'password'), 
    'password');

Note that the cipher text returned by the encryption function and passed to the decryption function is in bytea format.

To use public key functionality, first a key is required. This can be generated using GnuPG with a command such as:

gpg --gen-key

The PostgreSQL documentation suggests that the preferred key type is "DSA and Elgamal". Once the key is generated, you'll need to export it:

# List the keys in the keyring:
gpg --list-secret-keys
 
# Export a public key in ASCII armor format:
gpg -a --export KEYID > public.key
 
# Export a secret key in ASCII armor format:
gpg -a --export-secret-keys KEYID > secret.key

The public key can now be used to encrypt data using the SQL encryption function:

pgp_pub_encrypt('<data>', '<public key>')

Similarly, the data can later be decrypted using: 

pgp_pub_decrypt(<cipher text>, '<private key>')

Again, note that the cipher text is in bytea format.

Use encryption when storing pieces of sensitive data in the database that need to be retrieved in the future. Consider carefully whether symmetric or public key encryption is most appropriate for your use case. Public keys generally make more sense when exchanging data with others (because there's no shared secret), whilst symmetric keys may make more sense for a self-contained application.

11.2 Compliance considerations

11.2.1 General considerations: SOC 1, SOC 2, SOC 3, and ISO 27001

System and Organization Control (SOC) reports are administered by the Auditing Standards Board (ASB) of the American Institute of Certified Public Accountants (AICPA). They are audit reports that document key controls in data centers and in cloud operations. Whereas SOC 1 is focused on internal financial controls, SOC 2 is focused on Security, Availability, Confidentiality, Privacy, and Processing Integrity.. 

  • Security - how is information protected from unauthorized access?
  • Availability –  is the technology designed in such a way that it is reliably available?
  • Confidentiality – is confidentiality of customer data monitored and assured?
  • Privacy – is customer data privacy assured?
  • Processing Integrity – Is the system processing complete, valid, accurate, timely, and authorized to meet the entity’s objectives?

The details of the SOC 2 controls are defined here.

SOC 2 reports are more specific than SOC 3 and are usually shared with customers under NDA. SOC 3 reports audit the same controls as SOC 2, but are less detailed and are intended for general distribution and marketing purposes. SOC 3 is also referred to as “Trust Services Criteria for General Use Report”.

ISO/IEC 27001 puts focus on the management of information security risk, through the development and implementation of a Information Security Management System. It is comparable in scope to SOC 2. SOC compliance is more frequently used in North America; outside of North America ISO/IEC 27001 appears to be more popular. 

SOC2 is a critical compliance for many cloud offerings. It's recommended to select an offering with either SOC2 compliance or one that is ISO/IEC 27001 certified.

11.2.2 Industry or use case specific considerations: GDPR DPA, HIPAA BAA, PCI DSS, Fedramp

Federal Risk and Authorization Management Program (FedRAMP) is a security framework focused on the adoption of  cloud services by the federal government. Currently, Postgres DBaaS are available for ‘FedRAMP High’ or ‘FedRAMP Moderate’. 

Generalized Data Protection Requirements Data Processing Agreements (GDPR DPA) are required to allow CDOs to process personally identifiable information (PII) or store PII for a data processor or data controller that is required to meet GDPR requirements.

Health Insurance Portability and Accountability Act (HIPAA) Business Associate Agreements (BAA) must be in place for any provider who gets access to protected health information (PHI). A CDO that wants to manage PHI on its DBaaS must be covered by a HIPAA BAA.

Payment Card Industry Data Security Standard (PCI DSS) certification is required to allow a CDO to handle payment transactions as part of their DBaaS. 

11.3 Is it safe to use Postgres in the cloud?

A simple answer: Yes, provided that: 

  • The CDO has a SOC 2, SOC 3 or ISO/IEC 27001 certification
  • The CDO has in-depth Postgres knowledge that is readily available to the customer for planning, support, and advice
  • The customer understands the shared responsibility model
  • The security and compliance guidelines are followed
  • The CDO has a proven track record of keeping software, hardware, storage, and networking infrastructure up to industry standards

Reputable and experienced Postgres DBaaS providers, such as AWS, Azure, Crunchy Data, and EDB may in fact be more likely to provide secure and reliable Postgres services than many inhouse database operations. The reasons for that are:

  • The CDO’s SRE team continuously monitors the database operations to maintain SLAs
  • Encryption at rest and in transit is built into the infrastructure
  • SOC 2 and ISO/IEC 27001 standards are defined and audited regularly - which is not always the case with inhouse database teams

12 How to choose the best DBaaS for your needs?

A successful DBaaS brings together two key operational competencies:

  • Provisioning and operation of the infrastructure of CPU, storage and networking
  • Provisioning and operation of the database software, e.g., Postgres.

It is key to choose a vendor who is proficient in both areas and has a proven track record. Key questions to ask:

  • What is the SLA track record for the overall service in every region this service is being provided? CDOs that cannot provide this record obviously do not manage it and should be avoided.
  • Does the service leverage the latest technology, to provide the highest throughput in compute, storage and networking? Databases are extremely IO and compute intensive; making sure that the vendor is dedicated to refreshing their technology is key to achieving and maintaining performance.
  • Does the vendor have the in-depth software proficiency to provide expertise, support, maintenance, and bug fixes? Especially in the open source area, it is extremely tempting to put software on a cloud platform, without having the in-depth expertise needed to maintain it and live up to demanding enterprise standards. In the case of Postgres, for example, the vendor should have a sufficient number of resources on staff that are listed as active Postgres contributors or have actively participated in the latest version of Postgres. CDOs that do not have a back-to-back support agreement with the software vendor (for commercial software) or that are not actively engaged with the respective open source community should be avoided.
  • How easy and costly is it to get data in and out of the DBaaS? Some DBaaS make it cheap and technically simple to upload data, but prohibitively expensive and technically difficult to extract data. This creates a ‘Hotel California’ effect, where applications will never be able to leave the DBaaS because of unacceptable downtimes or cost.
  • Does the vendor have a strong roadmap for the service and a good track record of delivering? Technology stagnation can quickly become a very real problem, and combined with high egress cost or missing data extraction solutions, it can create a zombie database that will be a drag on the business.
  • Are all pricing structures transparent? For example, some vendors charge large amounts for IO, but don’t provide tools to manage or constrain IO. This can lead to unexpected cost explosions.

These guidelines should be sufficient to identify a suitable CDO. We highly recommend periodical re-evaluation of a CDO against these criteria.

13 Migrating to the cloud

The following section is derived from a blogpost written by Bruce Momjian that discusses the general database cloud migration process. View the original post for a complete look at the full article.

13.1 What is being moved?

Databases rarely operate in isolation. There are usually one or more applications that interact with the database, in addition to middleware, monitoring, backup, failover, and other tooling. The discussion in this section will focus on moving the data, and replacing the data management tools - but one cannot see a database in isolation. Most databases are tightly connected to the rest of the enterprise.

13.2 How is it currently hosted?

As stated earlier, the database could be hosted on bare metal, virtual machines, or Kubernetes/containers. Why is that important?  Well, you are going to want to replicate your current setup as much as possible in the cloud. If you are using Kubernetes/containers, you have already abstracted the dependencies of the database, so you might be able to move Kubernetes pods directly to the cloud infrastructure. If you are using virtual machines, you probably already have tooling to recreate these virtual machines in the cloud. If you are using bare metal, the database might be intricately tied to the operating system, even in ways the staff doesn’t fully grasp. As part of the migration, all database dependencies will need to be identified, and it must be clear that all dependencies can be replicated in the cloud environment.

13.3 Migrating databases to the cloud: The moving process

Moving databases is never easy. First, they contain a lot of state - often terabytes of it - and moving that much data can be slow.  Second, databases are usually part of the critical enterprise infrastructure, meaning that downtime must be minimized.

The database moving process has two parts: the data moving process, and everything else, like the binaries, configuration, and extensions.

Postgres provides several ways of moving data to another system:

  1. File system snapshot
  2. Logical dump, e.g., pg_dumpall
  3. Binary replication
  4. Logical replication

The first method requires the server to be down, while the last two methods provide continuous synchronization, which can be helpful in reducing downtime.  If using the first and third options, you will need to use the same Postgres and operating system versions in the cloud. The second option allows to move between different software versions, but may require a longer restoration process as SQL statements have to be executed to restore schema and data. Moving the other parts is mostly a mechanical process.

13.4 Hardware Considerations

Cloud providers support a plethora of storage and compute options. This represents a great advantage over on-premises deployments where changing compute or storage is often difficult or even impossible. Many cloud vendors provide annual or even multi-year pre-purchases of compute capacity - we would recommend making those agreements only after the cloud deployment has stabilized and the capacity requirements are fully understood.

14 DBaaS is not an all-or-nothing solution

The vast majority of our customers use hybrid approaches for their database deployments. While DBaaS is the most popular solution by far, container-based and VM based solutions are still very important, especially in industries that are not ready yet to commit 100% to public cloud or in geographies where data locality requirements require on-premises deployments.

More and more enterprises decide to program to the Postgres API, which allows them to deploy on virtually every cloud, all major operating systems, and all key hardware platforms. Postgres supports bare metal, VM-based, container-based, and DBaaS-based deployments with a uniform API, thus creating the ideal database platform for hybrid deployment strategies.

15 Conclusion

DBaaS, or cloud database, is rapidly becoming the norm for databases in the enterprise. They are provided by a multitude of vendors ranging from cloud service providers who run multiple databases to specialty vendors who are deeply involved in the development of the database. Customers should be careful about selecting their DBaaS vendor, as databases are not commodities—they are mission critical capabilities.

Security considerations are key. Industry standards, such as AIPCA’s SOC or IEC/ISO 27000,create a powerful tool to assess and ensure security in the cloud - especially when used in conjunction with powerful built-in database security mechanisms. Often, putting databases into the hands of a reputable and competent cloud provider may be a more secure solution than running a database in-house.

DBaaS adoption, and especially Postgres DBaaS adoption, is at full swing. We highly recommend considering Postgres databases services from specialty vendors, such as Crunchy Data or EDB, especially as not all databases will go to the cloud and the majority of enterprises will need the same Postgres everywhere, on premises as well as in private cloud and public cloud.

16 Acronyms and Abbreviations

AICPA - American Institute of Certified Public Accountants
AKS -  Azure Kubernetes Service
API - Application Programing Interface
ASB - Auditing Standards Board
AWS - Amazon Web Services
CDO - Cloud Database Operator
CLI - Command Line Interface
DBaaS - Database as a Service
DBPaaS - Database Platform as a Service
EC2 - Elastic Compute Cloud
EKS - Elastic Kubernetes Service
FedRAMP - Federal Risk and Authorization Management Program
GCE - Google Compute Engine
GDPR - General Data Protection Regulation
GDPR DPA - GDPR Data Processing Agreement
GUI - Graphical User Interface
HIPAA - Health Insurance Portability and Accountability Act
HIPAA BAA - HIPAA Business Associate Agreement
IaaS - Infrastructure as a Service
IaC - Infrastructure as Code
IEC - International Electrotechnical Commission
ISO - International Standards Organization
NIST - National Institute of Standards and Technology
OS - Operating System
PaaS - Platform as a Service
PCI - Payment Card Industry 
PCI DSS - Payment Card Industry Data Security Standard
PHI - Personal Health Information
PII - Personally Identifiable Information
RDS - Relational Database Service
SaaS - Software as a Service
SOC - System and Organization Control
SLA - Service Level Availability
SRE - Service Reliability Engineer
VPC - Virtual Private Cloud

 


¹  DBaaS implementations commonly do not provide users access to the SUPERUSER role, as this would interfere with the shared responsibility model.

² DBaaS may not provide access to the superuser role.

Marc Linster, Ph.D., is EDB’s Chief Technology Officer. Marc is committed to EDB being an accelerator to providing architectural “know how” to help customers take advantage of Postgres without significant risk and cost. Marc believes that although new customer adoption of open source is easier than ...