[AUDIO BLOG] The Builders: "Is Postgres on Kubernetes Right For Your Business?" with Gabriele Bartolini

Gabriele Bartolini

May 22, 2023

Thanks for tuning into EDB’s audio blog series, The Builders, where business and tech thought leaders weigh in on top database industry trends and insights.

In this episode, Gabriele Bartolini, VP of Cloud Native at EDB sits down with Sanjeev Mohan, industry analyst and Principal at SanjMo to discuss how deploying Postgres with Kubernetes can further extend Postgres’ cloud native capabilities, and whether this deployment strategy is the right choice for all businesses.

Learn more about how Kubernetes and cloud native technologies amplifies Postgres from Gabriele Bartolini's on-demand webinar "Fuel the DevOps Movement and Innovate Faster with Cloud Native Postgres."

Transcript:

Hello everyone, this is your host Sanjeev Mohan of the Independence Podcast series. As you can see, I am in a very special place. What you are seeing is the top of the Conference Center in Amsterdam. I am attending KubeCon Europe 2023, and this is day two. The museum behind me is a very special place. In fact, this place is so special that finally, the fury of generative AI has been muted by the frenzy of Kubernetes.

But the question that still begs amongst many of us data professionals is: Is Kubernetes really the right solution for our data needs? To answer that, I went looking for Gabriele Bartolini. He's going to join us from Enterprise DB. We've been talking about doing this podcast for almost one year when we met in Valencia at KubeCon. So, Gabriele, thank you for coming. So nice to have you on this episode. Pleasure.

Yeah, hi everyone. And you're from Italy, Prato in Tuscany?

No, I'm from the north of Tuscany, very close to Florence. It's a beautiful city.

Very nice. Maybe we should do the next recording in Prato.

Yeah, yeah. It would be strange, but yeah, why not?

How long have you been involved with Postgres?

So, I think it's since 1999.

Yeah, I see. So, almost about 25 years.

Yeah, almost.

I see. And what were you doing at that time?

Like, what? I was still... I had been fascinated by open source, you know. So, I was in love with Linux. Then, I was studying statistics at the University of Florence, and I fell in love with data warehousing and data mining, all these kind of disciplines. And while doing my research, I wanted to use a database, and I tried first MySQL, and then I decided to go with Postgres. And since then, I've never stopped using Postgres, to the point that it became my business. When in 2008, I co-founded with Simon Riggs and Johnny Chorley the SecondQuadrant, which was then acquired by EDB in 2020. And now I'm at EDB.

I see. That's the whole story. And you've been a contributor to Postgres?

Yeah, yeah. So, my main contribution has been Barman, our popular open source backup and recovery tool for Postgres. And also, in terms of community, I'm one of the founders of the European Business Association. And I think... Yeah, I'm the one that organized the first Postgres conference in Europe in 2007. And since then, there have been conferences all over the world. So, really happy about that.

That is great. So, let's pivot to Kubernetes here. How did you make that shift to Kubernetes? Is Kubernetes the right way to go? What's going on in that space?

You know, this is... Right. It's the right tool if it works for you. You can't jump on it without knowing.

OK, so you need to know this, and it requires a mindset shift. You need to think and really open the door, look at what's there, and learn. When it comes to certifications, I think the CNCF provides a lot of useful certification paths. I believe the Certified Kubernetes Administrator (CKA) certification provides a lot of information if you want to manage databases.

I came across these things thanks to a DevOps journey with my team. We were always learning and looking for ways to improve the development process and increase productivity. We also wanted to create a happier environment with motivated people who feel attached to the organization. That's great.

The bridge between database management systems and Kubernetes is the operator. So, what is the purpose of an operator? Basically, an operator is a pattern that has become the best way to extend the controller by simulating and automating what a human would do. In the case of Postgres, the operator needs to simulate what a human would do in the event of a failure or even in the standard lifecycle of a database.

For example, we have developed an operator that encapsulates our 20+ years of experience. It manages one or more Postgres clusters. So, the operator abstracts what you would manually do on a database and in Kubernetes. It's like infrastructure as code. You can write a declarative statement that says, "Go, go, deploy!"

In a declarative world, we define the desired state of our cluster. For example, in an impressive way, instead of going through a series of steps to create a three-node cluster for Postgres (create primary, clone, create standby, clone, create replica), you simply say, "I want Postgres 15 with one primary and two standbys." The reconciliation loop of the operator ensures that there is always such a cluster. If one node goes down, it is recreated, and if the primary goes down, a failover is automatically initiated.

The operator is written in Go, just like Kubernetes itself. So, the operator is in Go, but the end user writes in YAML. They don't need to know the underlying language; it's entirely transparent. We follow the convention over configuration approach. With just a few lines of YAML declaration, for example, even just five lines, our operator can create the desired state.

The syntax used in YAML is specific to each operator. We have our own specification. It's not a standard syntax across operators; it varies from one operator to another.

OK, so basically, we define a public API, and that's how we interact with Kubernetes using a YAML file. We use kubectl apply to apply the YAML file, and that's it. The primary is created, and then the operator creates the replicas, simulating what a human would do. The advantage of Kubernetes is its self-healing capability. If one of the instances goes down, it can automatically recover.

Kubernetes also provides scalability. You can scale up and scale down by changing the configuration. For example, you can change the number of instances from 2 to 3, and a new replica will be generated. Then you can scale it down from 3 to 2.

In our operator, we have defined additional features, including logging and monitoring. Logging is returned in JSON format to standard output, following Kubernetes recommendations. We provide pre-configured metrics for monitoring, and users can extend them by writing their own queries. There are a lot of DevOps operations involved.

Regarding the operator we have developed, it was originally proprietary. However, last year, we open-sourced it and donated it to the independent community, Cloud Native PG. The operator is now entirely owned by the Cloud Native PG community. We applied for the CNCF sandbox last year, but it got rejected. We plan to reapply and eventually donate the project to the CNCF.

There is an open governance model in place, and everyone is encouraged to participate in the growing community. Cloud Native PG is the name of the open-source Kubernetes operator developed by EDB. However, it's worth mentioning that there were other operators in the Postgres community before ours. Zalando's operator has been in production for many years, and there are other operators like Crunchy. Each operator has its own failover management tool.

Our operator, on the other hand, has that logic built-in. It has the knowledge of the cluster's status without relying on another player to manage it differently. It uses the API to assess the status. Cloud Native PG is being used quite a lot at this conference. Many companies are showcasing database-as-a-service using Postgres with our operator.

You mentioned seeing some screens. Yes, there are companies displaying their solutions, and when we ask them, they say they are using the operator.

You also mentioned the "million-dollar question" about stateful sets. Initially, when they came out, they were meant for stateful applications. But then we started hearing about stateful sets and the readiness of using databases. Finally, in February, Kelsey Hightower put his weight behind using databases with Kubernetes. He said it's OK, and he feels it has reached a certain level of maturity.

I believe his statement aligns with our opinion that we developed throughout our journey. Essentially, if I interpret his words correctly, he says that with a good operator, you can run Postgres in Kubernetes just like you were running it in VMs before. This message came out in February, and I've already had people approach me saying that Kelsey's endorsement makes them confident in using databases with Kubernetes. We understood early on that it was not only feasible but also the best way to run Postgres. It's incredible.

As for overhead, if any, there might be performance latency.

Yeah, so basically when I started this project, our first experiment was to fail fast. I didn't want to waste time with my team and organization on something that wouldn't be used by our largest customers in the world. So we set up a physical on-premises cluster with each node having local disks, and we performed benchmarks on the storage and the database. First, we tested it with Linux directly, and then with the storage. The performance penalty was less than 1%, which is negligible. In terms of OLTP transactions, the impact was less than 4%. That's when we realized that running Postgres in Kubernetes was feasible.

At that time, we were using OpenEBS, and I published a blog article about our experiment in June 2020. Through that, I was contacted by the CEO of MayaData, who informed me about the Data on Kubernetes community and invited me to be involved in building awareness about running stateful workloads in Kubernetes. This community has grown significantly and played a crucial role in building confidence in running various types of stateful workloads. You have the flexibility to choose shared architectures or dedicated architectures where a node is fully dedicated to a single Postgres instance, even with a SAN attached, as we used to do in the past.

Before we dive into topics like portability and reliability, you mentioned the Data on Kubernetes community. Yes, there was a highly attended session, and there will be a surprise guest discussing the Data on Kubernetes community after our conversation. So stay tuned for more information on how to join and participate actively. But let's get back to how Kubernetes managing databases will impact the work that DBAs do.

I want to share something that happened about a month ago during a workshop. A DBA approached me and expressed concern about losing their job because of automation and AI. I reassured them and said that their job actually has an opportunity to be elevated. Their skills can become more valuable to developers and the entire organization. They can focus on tasks that cannot be easily automated and prevent incidents that used to wake them up at 3 AM, like when the primary database went down.

In my opinion, going back to the DevOps culture within an organization, what we do is complex, and it requires collaboration and expertise from different roles.

OK, so I believe we need to work in teams that consist of individuals with multiple skills, ideally with T-shaped profiles. These profiles are specialized in one area but can also have knowledge in other areas. For example, a T-shaped DBA can communicate with a T-shaped developer and a T-shaped DevOps engineer. The horizontal part, which involves tools like pipelines and GitHub, can be handled by the various technologies available in the organization.

It's important for these teams to understand the boundaries of their respective roles and the connections between them, so they can collaborate effectively as a unified team. The DBA can then focus on more interesting tasks like monitoring, creating dashboards, optimizing indexes, and ensuring the overall health of the database. They don't need to worry about mundane tasks like managing the underlying infrastructure since that can be automated with tools like our operator.

Of course, there will be a learning curve and some unlearning involved, but I believe it's for the best. I remember that even the DBA I spoke to during the workshop agreed with this shift. The industry refers to these automated tasks as "undifferentiated heavy lifting" or the monotonous pieces that can be automated by Kubernetes. DBAs can now focus on higher-value tasks like data integrity validation, ensuring data correctness, and helping developers with SQL-related matters.

Speaking of SQL, I think it's the most underestimated language we have. It's incredibly rich, and I believe we are only using a fraction of its potential. That's why I'm working on moving forward with Postgres and Kubernetes to rekindle what we started 15 years ago when we were building the Postgres market. I want to bring that same community spirit to the new generation and educate them about what Kubernetes and Postgres can achieve together. We need to explain SQL and other relevant concepts to the new generations. It's a very exciting prospect.

Yes, it does feel like we're witnessing a rebirth of sorts. Just like 15 years ago, there was an active community around Postgres, and now we have the same with the addition of Kubernetes, which brings a new level of richness and possibilities. It's an opportunity to start a new S curve, rather than stagnating in maturity and eventually declining. I find it all very fascinating and look forward to the future.

Yes, I agree that we have a completely different audience now. Currently, databases are not seen as they should be seen, but that's about to change. The new audience will come from developers who are born in the cloud-native era, where working directly with databases is the norm. These younger developers are accustomed to writing microservices, and now they can incorporate Kubernetes orchestrated databases into their workflow.

In fact, one of the goals of our operator is to design the microservice database. The idea is to make each developer and each application responsible for their own database. We're moving away from the monolithic database approach. With this new paradigm, developers can be almost independent in developing their own applications, including the database in their pipelines. They can test upgrades, migrations, and ensure that their application works correctly with the database, all within their development pipeline. This enables continuous delivery, especially for applications that are deployed multiple times a day.

We have the opportunity to introduce automated gates that also check the database, ensuring that new features are delivered in a faster and more reliable way. This shortens the lead time and brings value to the organization. It's a significant shift in accelerating development and delivering the new features that businesses have been asking for. In the past, making changes to the data model would involve a long and complicated process. Now, we can respond faster, and with the clarity of configuration and version control, we have a clear understanding of the software combinations and versions in our infrastructure. This also facilitates compliance and change management operations.

It's refreshing to see this view of Kubernetes as more than just a back-end automation tool for deployments and high availability. The developer community now has control, and they can easily incorporate a local database that coexists with their application, taking responsibility for it. This is where DBAs need to be integrated. They can work with developers, helping them write queries, tests, and model the database. It's one of the tasks that DBAs can now focus on.

You mentioned that complexity often comes up as a concern. I find it interesting because it's similar to the human body. The human body is complex, and we need all its parts to function properly. Similarly, managing databases is complex because we're not just managing a machine; we're managing an entire process that spans across multiple data centers and infrastructure.

Yes, we are indeed pushing down complexity and management risks from the application level to the infrastructure level. That's why we need skilled individuals who understand how to set up and manage the infrastructure. You can choose to have a self-managed infrastructure or opt for managed services provided by the cloud service provider. In either case, having the right skills is crucial.

By leveraging technologies like availability zones and stretching clusters across multiple data centers, we can reduce the risk and cost at the application level. If one availability zone or data center goes down, the workloads can seamlessly function in the other zones or centers, ensuring high availability. In the past, manual intervention at the database level was required, but with the operator, failover and promotion can be automated, minimizing downtime and data loss.

Kelsey's analogy of the human body being complex was interesting. Just as most people don't understand the intricate workings of their own bodies, developers using Kubernetes don't need to worry about the underlying complexity of the infrastructure. The Kubernetes operators abstract away that complexity, allowing developers to focus on writing scripts and executing commands without having to deal directly with Kubernetes itself.

You're correct that if issues arise at the Postgres layer, expertise in Postgres is needed, just like a brain surgeon would be needed for specific medical conditions. Going back to our open-source Postgres operator, Cloud Native PG, it is indeed used by EnterpriseDB's BigAnimal DBaaS. This showcases the confidence we have in running data in Kubernetes. We have a fork of the community-driven PG operator with minimal differences called EDB Postgres. We provide support for OpenShift, Rancher, Tanzu, and other platforms. Additionally, we offer long-term support versions to meet organizations' needs for extended support periods.

Regarding the Cloud Native PG operator, it doesn't need to be compiled. It is available to run on various cloud service providers such as Amazon Elastic Kubernetes Service (EKS), Microsoft Azure Kubernetes Service (AKS), and Google Kubernetes Engine (GKE). This standardization of infrastructure is a significant advantage, removing the vendor lock-in associated with specific cloud service providers.

Yes, you can indeed have a setup where the primary cluster is in one cloud provider, such as Amazon Elastic Kubernetes Service (EKS), and the replica cluster is in another cloud provider, like Azure. This allows for a multi-cloud configuration and provides portability between cloud providers. It's particularly useful for scenarios involving data sovereignty and compliance with data residency laws. For example, you can have primary operations in one data center and then, every few months, switch the primary operations to another data center. This multi-cloud resilience mitigates the concentration risk associated with relying solely on one cloud provider.

In the database space, vendor lock-in has been a well-known challenge. However, with the operator, you have the flexibility to move away from vendor-specific databases like Oracle. Customers have shown interest in migrating from on-premises virtualized Oracle databases to PostgreSQL with the help of the operator. The operator, both in its open-source version and the EDB Postgres version, supports different architectures such as ARM and IBM Power Systems. Additionally, the compatibility layer with Oracle in EDB Postgres Advanced and the upcoming multimaster product called Distributed enhance the capabilities further.

Regarding the practicality of having primary in one cloud and replica in another, there are two approaches: storage-based replication and application-based replication. DB has extensive experience in Postgres replication systems, including stream replication, which is highly controllable at the transaction level. The operator incorporates this replication capability, leveraging the write-ahead log (WAL) and transaction log. By archiving and storing WAL files in an object store at regular intervals, replication to another region can occur without requiring a direct connection between the clusters. This enables continuous backup and archiving in one region and continuous recovery in the other. Out-of-the-box, this setup provides an RPO (Recovery Point Objective) of 5 minutes, meaning you could potentially lose up to 5 minutes of data in the event of a failure.

For customers who require a lower RPO or almost zero data loss, a network connection can be established between the clusters for streaming replication. The operator supports dual-channel streaming replication, which can bring the RPO down to nearly zero, depending on the latency and specific requirements. While synchronous replication across regions may not be feasible due to latency, achieving an RPO of almost zero within regions is possible.

Yes, we have customers who have requested and implemented such setups with different RPO requirements. Some customers are satisfied with object storage-based replication, while others opt for streaming replication for a lower RPO. The choice depends on their specific needs, tolerable data loss, and the desired level of synchronization.

In the next release, version 1.20 of Carnatic PG, there are a couple of roadmap items that will be included. Firstly, there is the introduction of role management. In the YAML file, users can define the Postgres roles they want, and the operator will ensure that those roles are maintained at all times. Secondly, there is the introduction of hibernation. With hibernation, a cluster can be set to a hibernated state, where the pods are shut down, but the Persistent Volume Claims (PVCs) are retained. This is made possible because the operator directly manages the PVCs instead of using stateful sets.

Stateful sets are commonly used for managing consistency in other operators, but the decision was made to bypass them and directly control the PVCs in Carnatic PG. By managing the PVCs directly, features like volume expansion, controlled rolling upgrades, and fault investigation become possible. For example, if there is a need to investigate data integrity, the operator can bring down the pod, allowing access to the data directory to check for any corruption. This level of control over the storage and data files is a significant advantage of self-managing a database with Carnatic PG.

Additionally, the operator provides options for storage-level encryption. With the open-source operator, encryption at the storage level is available. In EDB Postgres 15, there is even support for Transparent Data Encryption (TDE), not just in Postgres 15 but also in earlier versions like Postgres 9.5. This provides an extra layer of security for the data.

Regarding the choice between self-managing databases and Database-as-a-Service (DBaaS) or serverless solutions, it depends on the organization's specific needs, skills, and desire for control over the data. If an organization already has strong Postgres skills and wants full control over the data, self-managing a database with Carnatic PG can be a suitable choice. It allows for fine-tuning configuration, storage-level encryption, and security measures that may not be available in DBaaS offerings. However, if an organization lacks Postgres expertise or prefers a managed service approach, DBaaS or serverless options could be more suitable.

The discussion highlights that the decision ultimately depends on what works best for each organization, considering factors such as skills, control requirements, and specific use cases. Kubernetes is being widely adopted for various databases, including Kafka, MongoDB, and Cassandra, with respective operators available. Therefore, organizations already familiar with Kubernetes may find self-managing a Postgres database with Carnatic PG aligns well with their existing skills and allows for granular control over their data.

As the conversation comes to a close, the focus shifts to the second guest, Melissa Logan, CEO and founder of Constantia IO. She manages various independent communities, including the Data on Kubernetes Community and the Data Mesh Learning Community. The Data on Kubernetes Community serves as a place for end-users to gather, share best practices, and exchange resources related to running data workloads on Kubernetes. It currently has over 4,000 members on their Slack instance and is open for anyone to join.

The community operates with the support of several sponsors, including platinum sponsor DataStax and gold sponsors Google and Percona. Additionally, there are around 20 silver sponsors backing the community's activities. The community also runs a Community Collaborator Program, providing open-source projects that are foundation-backed the opportunity to collaborate and engage with the Data on Kubernetes Community. The goal is to foster discussions and knowledge sharing across different projects and technologies used for running data workloads, such as Apache Spark and Rook.

As the episode concludes, the host expresses gratitude to Melissa Logan for joining and sharing insights about the Data on Kubernetes Community. The viewers are encouraged to continue their Kubernetes journey.

Resource Feature Callout 1

[AUDIO BLOG] The Builders: "Is Postgres on Kubernetes Right For Your Business?" with Gabriele Bartolini

Gabriele Bartolini

Learn more about how Kubernetes and cloud native technologies amplifies Postgres from Gabriele Bartolini's on-demand webinar "Fuel the DevOps Movement and Innovate Faster with Cloud Native Postgres."

More Blogs

Webinar Recap: Hybrid and Multi-Cloud: The Future of Web Computing

Getting the Best Price and Performance with Microsoft Azure, Intel and Postgres

Webinar Recap: Keep Your Business-Critical Apps Always On with EDB BigAnimal Geo-Distributed Postgres in the Cloud