Postgres on Kubernetes or VMs: A Guide & Framework for Running Databases the Best Way
Kubernetes (K8s) is increasingly gaining popularity. With the growing interest of containers/K8s, Postgres is also becoming a core technology that many developers/users want to deploy in the same environment. This led me to think about the framework one should use to make the decision to run Postgres in K8s vs. using a dedicated VM.
This post provides a framework to follow to ensure you’re running your database optimally, and will help you determine which deployment method is best for running Postgres to meet your needs.
What is Kubernetes?
Kubernetes is a container orchestration system. It is an open-source platform that runs a cluster of worker and master nodes, allowing teams to deploy, manage, scale and automate containerized workloads. Kubernetes can manage many applications at a massive scale, including stateful applications such as databases or streaming platforms.
What is PostgreSQL?
PostgreSQL, or Postgres, is an object-oriented relational database system that uses the SQL language to perform queries. It provides features that safely allow users to persist and scale data workloads. Postgres is open-source and free, and has proven to provide flexible and reliable features to a range of applications.
If you ask the DevOps team, they would prefer things simplified and straightforward. They tend to treat the Postgres database the same way they treat normal apps/services they’d manage in K8s. In other words, they would like to manage and run many micro Postgres services.
The key benefit of working in the K8s environment is the ability to easily manage generic workloads. If you already understand the generic workload requirements and are running them as micro Postgres services, using the K8s platform is a solid choice, enabling seamless management.
When to Use K8s
The following are some use cases of K8s, which can help clarify how you might use Postgres in K8s:
- Most of the time when your developers are developing an app, they will need to have access to the database. EDB Postgres on Kubernetes can help in deploying the database quickly and making the database available for development purposes. If a developer wants to start a new app from scratch, then they can remove the old database and create a new one at their convenience. Or it might even be the other way around; where you create one for a development cycle, which is removed after finishing the cycle.
Testing App with the Database
- For functional and integration testing of the app, many developers want to automate the test cases. In the automation of test cases, developers want to perform end to end microservices testing, which includes starting app microservices and the database as a service. After the changes in the code developers would like to re-run the test cases for the end to end service.
- Part of the testing might be about changing character sets, changing location for WAL files (to faster storage), Postgres versions, etc. Postgres in K8s recreates the cluster front to back, so running different character sets, Postgres versions, etc. is just as easy as changing one parameter in the deployment definition. This is the big advantage of Infrastructure as code. Furthermore, by creating for the test, and cleaning after the test, the test platform is scalable. As an example, it is possible to run all tests for different versions, character sets, etc. in parallel, and the resources are only required at test time.
Microservice Production Environments
- There are the use cases, where developers would like to keep the database close to their app services, which we define as a microservice dataset. In such an environment, all functionalities are built with a lot of small microservices. Microservices typically only would hold the data that is directly relevant to that microservice. With that, a microservice dataset can be easily identified as a dataset only holding data for one microservice (typically, one schema, few tables, etc.).
- Microservice datasets are good candidates for K8s making them easy to manage and maintain.
- A clear example of a microservice dataset would be the database data for a service that is responsible for showing inventory information (like products and price information). In that case, it makes sense to store that specific data in a Postgres database running in K8s next to the microservice.
- K8s is a more complex infrastructure than 'just a bunch of VM's. But deploying a workload on K8s (like Postgres on K8s) is far easier than deploying on a VM. So, although deploying a Postgres workload on a few VM's is easier than building a K8s cluster, deploying and maintaining 100s or 1000s of Postgres deployments is far easier on a K8s environment, even if you include building that K8s environment in the first place.
- It helps if the workloads have similar size and have generic tuning, configuration, and handling, because exceptions, make it more difficult to size and tune the generic K8s environment.
When Not to Use K8s
In cases where your workload is less generic, or you have very specific workloads (which require more CPUs, IO, etc) and databases need special attention, then you could create a special K8s nodes pool for the databases’ services, but it might soon be better to move such databases to its dedicated VMs. Choosing the right path is of course driven by the cost and effort into running specific node pools vs the cost and effort of running separate VM's for some deployments.
The following is an initial framework that could help to decide on running Postgres in a Virtual/dedicated machine or in K8s.
For some database workloads (like data warehouse workloads) CPU pinning can be beneficial. It keeps the specific CPU’s reserved for this workload, and it links this database to run on that specific CPUs, which greatly enhances cache usage and other lower-level optimizations. If CPU pinning is required, it is technically possible to run with that on Kubernetes, but moving to VM's probably makes much more sense.
EDB PEM is really useful to monitor CPU utilization and can give you some insight into this.
|Memory||If your database is consuming a bigger portion of the memory than the Kubernetes node can offer, Kubernetes has a harder timescheduling the pod. If the memory requests entirely exceed node memory, it will not get scheduled at all. Although it is an option to run with larger nodes, either entirely, or in a second node pool. It might be better to keep the default node size for the generic workload and move the few exceptionally larger consumers to separate VM's instead.|
|I/O Tuning||Storage will perform similarly on Kubernetes as on VM's, but VMs do allow for more flexibility in building specific storage configurations. For example, if you want to increase storage IO by running software raid, LVM caching, etc., there is no real option to configure this for Kubernetes workloads. Running such workloads on VM's allows you to leverage the required performance boosts if needed.|
|Backup and Recovery Time||
For backup using solutions like BART, duration scales linearly with the size of the database. Therefore running backups on larger databases might require too much time. To bring down backup duration for larger databases, other techniques, like storage snapshots could be leveraged. Although storage features of Kubernetes storage providers might expose such options, it might require to use other storage options, acquire third party storage solutions, etc. If this is required for only some of the workloads, this might be a good reason to move these workloads to VM's.
To recover a backup using solutions like BART, a lot of disk writes are required. Recovery from storage snapshots greatly improves recovery performance.
Analytical queries are complex and resource-intensive. Due to the nature of the analytical queries, Postgres would need more CPU, Memory and some tuning. Kubernetes can serve this kind of workload. However, if a few databases are taking more resources from the Kubernetes pool and require some specific tuning to perform for the queries, then it may be wise to move such databases to their dedicated VM.
If you have a use case for data warehousing and you are storing data from different sources for analysis and it is going to take a large pool of your shared storage systems and resources, then, you should consider moving the workload to a dedicated VM with resources required for the Postgres.
|Filesystem Tuning||If you need to improve performance by tuning the filesystem for a few databases, then consider moving those few databases to their dedicated VM or special K8s nodes pool.|
|Upgrade Complexity||Performing a major upgrade by hand can still provide a fast and direct major upgrade path. Although we are looking into options to automate such options for Postgres on K8s, the manual options are currently faster to perform. For applications that require little downtime for major upgrades of large datasets consider using VMs for the databases and perform these steps manually, until Postgres for K8s has faster / online features implemented.|
The above framework talks about a few factors and you should consider all or some factors (not one) to guide your decision for moving your database from K8s to dedicated VMs. Please note the framework is intended to give you a starting point when considering your options; there may be other factors specific to you (such as the availability of existing resources) that you should also consider.
This blog post was co-authored by Vibhor Kumar, Marc Linster, Dave Page, and Sebastiaan Mannem.
Want to learn more? Download our eBook to explore 5 Questions to Ask When Designing Highly Available Databases.
Get Postgres Tips and Tricks
Subscribe to our newsletter to get advanced Postgres how-tos.