During the creation and rollout process of the EDB Database as a Service (DBaaS) offering, BigAnimal on Microsoft Azure, we had a large customer approach us and ask what it would take for them to create an on-premise DBaaS offering on their own from scratch. While this is definitely an interesting question that lets us explore the characteristics of Databases as a Service, it requires a deeper understanding of the features and characteristics of an ideal DBasS offering. The purpose of this document is to discuss and outline the characteristics of an ideal database as a service (DBaaS). First, it is important to define exactly what a DBaaS is and then determine the characteristics of a successful DBaaS.
Defining the term DBaaS is not as clear as it would initially seem on the surface. To help clarify, the document produced by the U.S. Department of Commerce National Institute of Standards and Technology (NIST) entitledThe NIST Definition of Cloud Computing (NIST SP 800-144) is a good starting point. While the NIST document does not define Database as a Service directly, the NIST definitions for other cloud provider service models can be utilized to derive a definition for a Database as a Service. NIST defines three (3) service models for cloud computing - Infrastructure as a Service, Platform as a Service, and Software as a Service.
NIST SP 800-144 defines Infrastructure as a Service (IaaS) as a cloud provider service model where the cloud provider provisions processing, storage, networks, and other fundamental computing resources and the end-user can deploy and run software, which can include operating systems and applications that they own. The consumer does not own, manage, or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components (e.g., host firewalls). IaaS functions similarly to traditional software and database management in that the end-user maintains hands-on control over every aspect of your infrastructure except where it is deployed. The main difference is that a cloud server is used instead of a physical one - think of it as the next logical extension to virtual machines.
NIST SP 800-144 defines Platform as a Service (PaaS) as a cloud provider service model where the consumer/end-user deploys onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools provided and supported by the cloud provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment. PaaS differs from IaaS in the fact that the consumer/end-user pays the cloud provider for the use of the development platform and infrastructure instead of purchasing it themselves. The provider installs and maintains any needed components on the server and this also includes patching and upgrading both the applications and the operating system. Think of PaaS as a deployed environment ready for application development.
Finally, NIST SP 800-144 defines Software as a Service (SaaS) as a cloud provider service model where the end-user utilizes the cloud provider’s applications running on a cloud infrastructure. The applications are accessible from various client devices through either a thin client interface, such as a web browser (e.g., web-based email), or a program interface. The end-user does not own the underlying software and does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user specific application configuration settings. All of these tasks are the domain of the cloud provider. Basically, the required software is deployed on a cloud infrastructure and the end-user simply accesses and utilizes the software as needed. Microsoft Office 360 and Google Docs are common examples of Software as a Service.
Based on this background, the question still remains - where does a database fit into all of these “as a Service” offerings or models? A DBaaS is not directly aligned to either of the three “as a Service” models according to strict NIST SP 800-145 definitions above. However a DBaaS is more narrowly defined and closely aligned to Platform as a Service. A DBaaS could be thought of as PaaS focused on a database. A DBaaS is a cloud provider service model where the consumer/end-user is provided a database deployed on a cloud provider's infrastructure. All, or most, of the administrative tasks and maintenance of the database and operating system are performed by the cloud provider. The end-user primarily focuses on utilizing the database. Depending on the cloud provider and the underlying database, the consumer/end-user may have some control over the database and configuration parameters.
Now that the definition of a DBaaS service model has been defined and understood, the focus will now shift to the ideal characteristics of a DBaaS. Again, the The NIST Definition of Cloud Computing (NIST SP 800-144) is a good starting point for discussion as it lists 5 essential characteristics of cloud computing including:
- On-demand self-service - automated provisioning without human intervention.
- Broad network access - access from anywhere with any number of platforms.
- Resource pooling - multi-tenant utilization of hardware and software (with boundaries) to allow for greater flexibility and greater utilization of resources vs idle servers sitting in a datacenter.
- Rapid elasticity - seamless ability to rapidly scale and release provision resources based on demand.
- Measured service - resource utilization is monitored, controlled and reported by the cloud service provider which provides transparency for both the provider and consumer of the utilized service and the consumer is billed for what is used.
While these are indeed essential characteristics of any cloud provider service model and will serve as a basis for any of the “as a service” models, it is important to consider the additional characteristics and details outlined below for a DBaas model.
Control Plane architecture
A solid architecture is the recipe for success whether building a house or developing a DBaaS model. An API that calls a bash, Chef, Ansible or Terraform script to provision and deploy a database in the cloud is not a DBaaS. A control plane is not simply provisioning pipelines which utilizes scripts; rather, it is the centralized “brain” of the DBaaS, providing both a user interface for managing database lifecycle events such as provisioning, scaling, and backing up, as well as actively managing the databases in response to operational events such as hardware failures. Responding to failures is especially important here: typical software-as-a-service tends to fail in relatively simple ways compared to DBaaS, where supporting each new user requires deploying and managing independent stateful distributed systems, often on independent hardware. A control plane-based architecture allows more sophisticated management and recovery processes to be implemented in a scalable way. Some off-the-shelf options here include EDB’s Cloud Native Postgres (CNP) which is a control-plane based architecture and could be wrapped up in APIs to provide much of a DBaaS experience. Comparing CNP to EDB’s Terraform/Ansible scripts or even TPAexec for deploying EDB Postgres Distributed—CNP is control-plane based and the others listed above are more “provisioning scripts.” The control plane does not need to do everything to be useful - its functionality should be aligned with the responsibility model of the DBaaS. Most DBaaS control planes, for example, will not automatically tune the database engine on behalf of users.
Self-service / on-demand
Self-service and on-demand dictates that rapid provisioning, deployment, scaling, configuration and other activities are automatically performed in minutes and not hours or days. It should be API-driven which only requires human intervention to initiate the process and not to perform the provisioning and deployment (emailing the admin for provisioning, scaling, restoring, etc is not self-service). High availability and disaster recovery should be built-in or at least optional (maybe there is a “development” option). Database recovery, for example, should happen automatically and the consumer/end-user should not have to take any action to recover a database instance. In case of leader failure scenario, the replica should automatically be promoted and new connections automatically routed to the new leader while the old leader is (automatically) repaired and re-joins the cluster as a replica.
Configuration must be self-service as well and not just database configuration (e.g., shared_buffers). Network and other infrastructure configurations should also have API-driven controls for configuring the network segments that can access the database. The underlying database should also be configurable by DBaaS users with the more configuration/access the better; however, this must be balanced against the control plane’s ability to manage the system and keep it viable.
Resource pooling is also mentioned in the NIST 800-145 but should be reiterated with additional details. The underlying infrastructure should be abstracted away from the consumer/end-user. Based on the self-service characteristic, they should not have to provision their own VMs and then ask the DBaaS to deploy on them as this is not a control plane based architecture nor is it self service. DBaaS might be multi-tenanted at various levels, or might not be! DBaaS deployments can take many forms. Multi-tenant may exist within a database engine where many DBaaS customers share a single PostgreSQL instance. Multi-tenant within a virtual machine is also a possibility where there are many single-tenant PostgreSQL instances on a single virtual machine. Likewise, multi-tenant within an “environment” may exist where there is one PostgreSQL per virtual machine with many virtual machines in the DBaaS “environment”.
EDB recommends multi-tenant within an “environment” - one should not try to share resources at the database level or the virtual machine level as this overly complicates the deployment. Containers can help with “multi-tenant within a virtual machine”, but they are extremely hard to get right. Containers are a much better packaging mechanism than a resource isolation mechanism.
Consumption should be metered on a time-basis, or utilization-basis. For example, the consumer/end-user should not be “billed” for cores, but instead billed for core-hours utilization. A more “cloud-native” “serverless” deployment might bill for requests instead of capacity. The measurement should be transparent and result in a predictable cost model.
Rapid elasticity refers to scalability both up and down. The DBaaS must orchestrate the scaling automatically. The consumer/end-user does not need to manually fail over database instances as instances are upgraded, instead the system scales gracefully. A DBaaS might scale automatically in response to customer demand; for example, increase storage size if storage is at 90% utilization, or add read replicas if existing read replicas are resource constrained.
This is a characteristic that was not explicitly mentioned in the NIST SP 800-144 essential characteristics. The concept of shared responsibility dictates that both the cloud provider and the consumer/end-user have some accountability to ensure that the DBaaS is deployed, configured, maintained and functions as designed. The cloud provider/vendor takes control of various maintenance tasks such as, database patching, hardware upgrades, high availability, backup and restore; whereas, the consumer/end-user retains responsibility for query performance, password management, resource allocation/selection (determining the hardware that is suitable for their expected workload). The division of the accountability/responsibility can vary and- different cloud offerings may draw different boundaries but the concept of “shared” responsibility by both parties should remain.
This raises an Interesting question for “internal” DBaaS - who is the service provider? Is there an SRE team or a platform team that is managing the control plane who actually upgrades database instances or receives alerts in case of a failure, etc. Normally this would be the cloud vendor (EDB, AWS, Azure, etc) and should not be overlooked.
Typically one of the goals of moving to a cloud database deployment is to avoid or get rid of proprietary vendor databases that lock the business into long term contracts. One of the foundations of moving to the cloud is agility. Agility allows for the portability of the database from on-premise to AWS to Azure or even switching in between without major changes to the underlying applications and the underlying database. The goal is to not abstract too far from the database the user knows and loves - something that looks and feels like PostgreSQL but does not actually behave like PostgreSQL can be very hard for DBaaS customers to utilize. It is preferable and recommended to build a control plane around a close-to-open-source PostgreSQL and provide consumers the database that they are used to utilizing. (Aurora vs Community)
Governance and Security
Governance and Security should be at the forefront of any “as a service” model and to be effective, can not be an afterthought tacked onto the end of the project. “Governance” refers to the opportunity for multiple roles for individuals to interact with the service - some users may be able to connect to a database, but not provision a new database; some users may be able to scale a database but not delete a database. The DBaaS management interface should provide functionality to limit the actions of certain users, and audit the actions of all users. Security does not just mean patching the DB and the OS with the latest security patches. It must also be able to limit the blast radius of a potential security compromise such as implementing network segmentation or OS-level controls to limit potential virtual machine escapes, etc. This dictates that threat modeling and other standard processes are extremely important to take into account when designing a DBaaS. Control plane security is also extremely important. If the control plane is compromised, all database instances on the platform are effectively compromised as well.
In conclusion, creating a Database as a Service from scratch requires a significant amount of effort to establish the required infrastructure and architecture to support the control plane, self-service and on-demand, resource pooling, rapid elasticity, and shared responsibility capabilities while trying to avoid vendor lock-in and implementing governance and security as a cornerstone of a Database as as Service offering. We believe our BigAnimal Database as a Service provides all of these characteristics and capabilities to organizations out of the box.