[AUDIO BLOG] The Builders: "No Hands on Deck: Automate Your Workloads with CloudNativePG" with Lizzie Macneill

February 14, 2024

Thanks for tuning in to EDB's audio blog series, The Builders, where business and tech thought leaders weigh in on top database industry trends and insights.

In this episode, EDB Sales Engineer Lizzie Macneill explains how to unlock the future of streamlined efficiency with CloudNativePG. Listen to discover how this cutting-edge technology is driving innovation and resource efficiency for EDB customers and explore the user-friendly features that make CloudNativePG a game changer.

Transcript:

I wanted to talk a bit today about one of my absolute favorite products, which is the cloud native Postgres Kubernetes operator, and how it's making EDB customers’ lives easier and driving innovative new development strategies.

Why do I love Kubernetes operators in general? Well, that's pretty easy. I'm a huge proponent of the, “it's not lazy, it's efficient,” life. So if something's repeatable, given a consistent set of variable inputs, I'm all for automation. I don't want to do the same task over and over; it's super boring and it's a waste of my time! Automation gives me back my most valuable resource, my time. To be working on important tasks that I unfortunately can't assign to a machine, yet. Though we’ll get there.

Resource efficiency is absolutely key for all of our customers, whether that's operational resources like literally employee time, or asset resources like compute and storage. And Kubernetes operators are there to give you both back. They give you time back for automation, obviously. Kubernetes itself gives you resources back by very effectively balancing the usage of available resources across concurrent workloads.

A Kubernetes operator is there to perform the tasks of a human operator so you don't have to do it yourself. What does that mean in terms of Postgres? Well, the CNP operator will deploy your database clusters for you, completely hands off, and from that point the cluster monitoring and the availability of the cluster is all handled by the operator on Kubernetes. Effectively from the moment you hand over the configuration to the operator, that's it, you're done. The operator and Kubernetes will work together to maintain the desired state of your cluster. So, total hands-off management. This is the joy of declarative deployment. It’s as good as going to a restaurant and asking somebody to cook you up a quick three-node Postgres. They do it all for you. You can sit back, enjoy a glass of wine with your friends, or in my case, sadly, get back to doing some more interesting work.

One of the many aspects of my job is running product demos for customers. And whenever I get into a demo with a customer looking at CNP, I actually find myself apologizing for how boring the demo can be. It takes me like literally three clicks to deploy a three node Postgres cluster, literally three clicks. I'm kidding, obviously my demos aren't boring, I show tons of cool features, but you get my point.

I'm an open GIF girl myself, and it takes me three clicks on the web interface to go from a standing start to deploying a highly available three node Postgres cluster all ready to go. It's actually just ridiculous. The customers are like, wow, is it really that simple? And I'm like, yeah, it really is that simple. Three clicks and we're done, what else do you want to know? Here's how to restore a cluster from a backup, also with three clicks and two lines of code... I'm overstating this slightly, but it's crazy how incredibly quick and easy it is to use.

Our CEO actually has a concept that he talks about – minutes to wow. With every product, he asks how many minutes does this product take to wow a customer?

With CNP, I can do it in under a minute. I'd say that's pretty good to be honest.

But to take it a bit deeper, there’s a really good reason why it's so incredibly easy to use these operators.  The incredible engineers that write these products have been working on these operators for over five years now. They have the execution of true cloud native down to a really fine art. And you know what, they have absolutely nailed it.

What do we mean by that? What do we mean by cloud native? It’s software that is written in such a way that you can actually use it to fully automate processes that used to need a human operator. Obviously there are different definitions out there, but this is what's important to me and what I've seen is important to my customers.

You can scale clusters dynamically, you can spin and bin development clusters with a few clicks. You can manage huge herds of clusters without needing to spend all your time terrified, staring at crazy dashboards, keeping all your fingers crossed that the servers behave themselves today, and all of that stuff.

But the automation part, why is that so hard though? I mean, we have automation technologies, we have CITD pipelines, we have Git frameworks, we have Ansible… why would it be so hard to write something and automate it? It's just running code, right? If we want to automate deploying our Postgres clusters, we can just write down whatever we would normally do to set up a Postgres database and just run that code automatically. Well, no. As I'm sure you've guessed, the answer is no.

One of the reasons that I'm so massively in love with the CNP operator, and I don't mind saying it, is it's completely self-sufficient. Think about what you do when you deploy a highly available Postgres cluster on some VMs. You locate your Postgres packages, deploy primary nodes, set up replication, choose a cluster management tool, mess around with some more packages, go through your routing, go through backups, go through monitoring…  there's a massive list of things to do. How much of that process requires human intervention? How much of it could you automate even if you tried? How many exceptions would be thrown that you hadn't accounted for… missing packages, unexpected patches, config changes and new releases and issues with backward compatibility? The point is, the more variables you introduce to your automation process, the more likely it is to throw an exception and fail. This is a really obvious thing to say. It's like captain-obvious time. But immutability is key to automation.

How does this apply to the CNP operator? It doesn't rely on any external software, for example, for availability, for failovers. It leverages Kubernetes itself to manage the availability of the cluster. It doesn't need external backup software either. It's completely integrated with the Barman Cloud API.

But why are these things important? Well, for example, if we tried to use our VM-centric or non-cloud native software to manage the availability of a CNP cluster, we'd be missing the point of deploying clusters on Kubernetes. I mean, that's what Kubernetes is there for, to monitor the state of a cluster, to return it to its defined desired state if there are any deviations. Why would you want to introduce unnecessary complexity and overhead to a deployment by essentially overriding or even conflicting with Kubernetes’ core purpose with some other software designed for a completely different paradigm?

But it goes even further. How do VM or bare metal centric tools actually manage workloads? They work to fix your system when it gets broken, whereas Kubernetes, on the other hand, works to an entirely different principle. If one of your nodes goes down, we don't try to revive it. It's going in the bin. You chuck it away, straight in the bin, get a new one, reattach it. It's super quick, super easy. So my point is, the principles of the VM world and the principles of the cloud native world are, generally speaking, totally at odds. Why would you try and shoehorn software built for the VM world into the cloud native world? And it's not just complexity and overhead, you're also introducing multiple points of failure and potential failure and risk. And reliability is everything in an automated system. And I'll say it again for the cheap seats in the back, immutability is key to successful automation.

Cloud native, or Kubernetes, means many things to many people, but to me it comes down to a very simple concept: if I have to repeat it, I don't want to do it myself. And if I want Kubernetes to do it for me, I have to be sure it doesn't need me to intervene. I need it to be absolutely and completely reliable. And immutability is the absolute key to that freedom. And the CMP operator is 100 % built on immutability.

Now that we've effectively established what I want from CNP, where does this drive business value for EDB customers?

There are the very obvious things, as I mentioned: the incredible ease of use, minimal minutes to wow, there's also the reliability, the ease of integration… It really is one of those take -it -anywhere kind of solutions. But actually, one of the more interesting benefits that I see customers running after, is getting cloud native benefits without having to pay public cloud prices or compromising on data sovereignty requirements.

It's a tale as old as time. A customer embarks on a cloud transformation journey, only to come up against unexpected costs, some predictable billing patterns, overblown I /O costs, and what can be a very real difficulty moving efficiently from a CAPEX centric model to an OPEX centric model.

In contrast, CNP clusters can be deployed with existing on -prem technology, using Kubernetes clusters to effectively gain the benefits of dynamic scalability,  resource efficiency, on-demand infrastructure, frictionless DevOps and so on, without having to rely on costly public cloud infrastructure to do that.

The data sovereignty issue is even more interesting. Some customers actually can't host their data on public cloud because of data locality requirements,  data protection requirements. For these customers, CMP represents a very real and a very exciting opportunity to move to a cloud native platform, which they were previously held back from.

On a completely different note, there are also customers with really advanced cloud transformation strategies that are using CNP's replica clusters to deploy genuine multi-cloud Postgres clusters. So taking advantage of the benefits of being able to run DR sites or actually soon to be actual active-active configurations across multiple public clouds, which is incredibly exciting.

Speaking of exciting, another very exciting new feature is the support for Kubernetes volume snapshots with CNP,  which now allow you to use volume snapshots for physical database backups. Our cloud native VP, Gabrielle Bartolini, recently released a blog post on this and showing stats of a 4.5 terabyte database being restored completely for a backup in two minutes, which is absolutely phenomenal. So completely wowed by that news. That's brilliant.

What do I see as potential barriers to entry for this kind of solution? After everything I've said, why isn't the entire world running Postgres on Kubernetes? Do I think Kubernetes has a high barrier to entry? Probably not so much these days. I think generally the biggest challenge that teams have when moving to a Kubernetes -based solution, especially for something like a database, is shifting from a bare metal or a VM-based mindset to a microservices paradigm or a cloud -native development framework.

The concepts themselves are simple enough. We already discussed ‘don't fix it, get a new one.’ But when you're running stateful workloads on Kubernetes, you can see why some people aren't a hundred percent comfortable getting there yet. It's easy enough with something like a web server. It's not tracking any data, it's just serving you up what's requested. If it fails, you just connect to another one. It's not a big deal, right? But a database has to manage  concurrent consistent connections. It uses  persistent storage. All connections have to run to the same services.

Kubernetes will provide what it calls stateful sets to provide persistent storage and stable IP addresses and so on when a disruption occurs to a pod workload. But in the spirit and reality of Kubernetes, there's no flexibility in this. For example, you can't resize your attached storage, you can't manage multiple attached storage, which is useful for separating out PG data storage, it's a well-known Postgres benefit. And you can't choose whether to recreate or reuse storage in the case of a node outage.

The CNP operator does not rely on stateful sets. CNP directly manages the underlying storage to ensure that the correct procedure can be followed depending on the state of the attached storage following a workload disruption. I know it's getting a bit technical, but this is a huge benefit of the CNP operator. So effectively, you still leverage everything awesome about Kubernetes and automation, but you still get a product that runs automation with the main specific knowledge, so you know that your workloads are being handled intelligently. It's not just a one-size-fits -all approach for an ill-do level of competence. For me, that's the stateful workloads question addressed, which is a fairly common concern.

Other things… I guess these are fairly complex concepts to go through, especially when you're embracing the new potential of things like dynamic auto-scaling and resource allocation. It can be quite a lot all at once when moving to a Kubernetes-based system. Another interesting issue can be around potential crossover between devs and DBAs and the conflation of those two roles in some ways. Roles and responsibilities have to be redefined a bit, more sometimes as the operator takes over some parts of the DBA role in some ways. But again, this is another business value area in general. I think more time being freed up for DBAs to work on more interesting tasks like data modeling, data architecture and so on, is generally seen in a really positive light.

Change is rarely seen as easy or even as pleasant, for many people. But  really, I think the massive benefits that can be gained from moving to a Kubernetes-based platform massively outweigh any issues with learning curves or changing roles and responsibilities. In other words, is the pain of change greater or less than the pain of staying the same? That's always a good question to ask with Kubernetes workloads.

Another easy trap to fall into is taking a monolithic system framework that was designed for a bare metal VM world and dumping it wholesale into a Kubernetes framework without breaking it down for microservices, for example. That's a very easy way to give yourself a lot of work without necessarily taking advantage of the benefits of a cloud native system.

You can get yourself a Kubernetes cluster as a service from pretty much anywhere these days. We have an awesome interactive demo as part of our documentation as well, so it's pretty easy to spin up the operator, give it a go yourself, and you too can be completely enthralled by how ridiculously easy it is to get yourself a highly available Postgres cluster right out of the box.

So yeah, I'd encourage you to go and play. Have fun!

Share this