Geographically Distributed PostgreSQL: A New Dawn on the Horizon for High Availability

Jan Karremans

August 08, 2022

How it all began

I could begin this story by telling you how modern IT makes the world smaller and that, therefore, we need our applications to be present and operational just about everywhere. But that would just be nonsense.

You will find the origins for technologies, like the one we will be discussing in this blog post, at least a decade ago, if not more. And as with all developments like this, it is haunted by misconceptions and misunderstandings. There is one near to my heart that I would like to highlight as an example.

Scaling PostgreSQL horizontally

One of our customers, the government medical office in one of the largest cities in Eastern Europe, some years ago, was looking to implement an e-medication application in the city. For both risk mitigation and high availability, an “Active-Active” Postgres implementation was required.

But what does "Active-Active" actually mean? This question occupied the majority of our meeting when I paid them a visit years ago. Looking at basic questions such as the Recovery Time Objective and the expected number and sort—read or write—of transactions that the application would create, we were able to determine that, in this case, "Active-Active" actually meant "Primary-Replica." The funny thing, adding complexity as it does, is that they can actually be the same. In the chosen setup both the primary and replica Postgres cluster are active: one for read queries and the other for read and write queries.

	Multiple read nodes	Multiple write nodes	Geo distribution
Active-Active	Yes	Depends	Depends
Primary-Standby	Yes	No	Not so good
Multi Master	Yes	Yes	Yes

So it all starts at the beginning—with your project requirements.

The power of multi-master data replication

This is where it gets a little more interesting.

There is something about PostgreSQL that might not necessarily be unique, but it is very interesting: its "Shared Nothing Architecture." It means that every PostgreSQL cluster is a fully stand-alone installation; it does not give you any of the dirty tricks that allow memory or storage to be shared or connected. That might not sound ideal for clustering, as some examples—such as Oracle's Real Application Clusters (RAC) and Digital Equipment Corporation’s (DEC) Virtual Memory System (VMS), now maintained by VMS Software Inc.—have shown that for good clustering, this "interconnect" is important.

Why is EDB Postgres Distributed interesting then?, you might ask yourself. It lies herein: each PostgreSQL cluster is fully self-contained, which means that technical issues that affect one cluster do not necessarily have to influence the other cluster(s). If you then take PostgreSQL's streaming replication into consideration, that allows other PostgreSQL clusters to tune into the transactions that are happening in other PostgreSQL clusters, you have a recipe for success. Let's explore.

Extreme high availability

In a previous blog post, we covered what extreme high availability means for modern application infrastructures. Additionally, there is one more factor that contributes to the extremely high availability of an application, and that is latency. Your database platform may be 99.999% available, but if the user of your application cannot get to your database platform quickly enough…well, you will fail. One reason could be that your database is running in a data center in New York City while your users are trying to regularly access it from Tokyo, Japan.

Let us pick this apart a bit more...

Oracle RAC

The classic example of a tool to achieve extreme high availability is Oracle RAC. The good news for PostgreSQL users is that most of the magic of this tool has now been matched by EDB Postgres Distributed. It would be repetitive if I went into an explanation about why that is and how it works; check out this white paper for those juicy details.

Oracle’s downside, however, is that RAC does not offer geographic distribution. This is an area where PostgreSQL fundamentally exceeds Oracle’s hallmark solution. The shared architecture that enables RAC to work for its intended purpose also prevents RAC from allowing its cluster to be extended outside the range that the required extreme networking can span, which is typically not outside its local Data Center.

Data Guard or Golden Gate

More tools to the rescue!

Let’s consider Oracle's (active) Data Guard or even Oracle Golden Gate to help move the data from here to yonder. That might sound like a very good idea, and many companies have successfully adopted these solutions to create local pockets of extreme availability.

Next to the fact that it dilutes the overall solution—as basically this is now an expensive combination of multiple RAC clusters, connected by standby database technology—it also adds tremendous complexity. Setting up a RAC cluster successfully is a good challenge, but combining several of them with, for example, Oracle Golden Gate, basically takes a small village of extremely well trained specialists.

Sharding

As a little side-step, I want to spend a few words on sharding. Especially when it comes to distributed computing, various forms of or algorithms for sharding might pick up speed and importance. Both for globally distributed workloads as well as for locally distributed workloads, sharding might just be something that fits your needs.

By not having all of the data available everywhere, you can significantly impact the volume of the changes that you need to share and agree upon. For geographically distributed workloads, this might save significantly on operating costs as you pay for every single bit you share. For locally distributed workloads—think of microservices architectures—you can end up with locally relevant data pockets and shared data pockets. If that is of interest to you, you might want to read this article.

PostgreSQL: a new dawn

It's a new dawn
It's a new day
It's a new life
The famous lines from “Feeling Good” by Nina Simone

Sometimes certain lines in software development and business requirements converge and create something cool. With a growing demand for geographically distributed workloads on the one hand and the transformation of application development on the other, truly a new day is dawning.

EDB Postgres Distributed

In previous articles, blog posts and white papers, we have demonstrated the incredible things EDB Postgres Distributed can do for extreme high availability of PostgreSQL. Up to 5 nines of availability translates to just 315 seconds of downtime per year! As a matter of fact, EDB Postgres Distributed opens up a whole new dimension in this respect.

Distributed database technology

According to Wikipedia:
A distributed database is a database in which data is stored across different physical locations. It may be stored in multiple computers located in the same physical location (e.g. a data center); or may be dispersed over a network of interconnected computers. Unlike parallel systems, in which the processors are tightly coupled and constitute a single database system, a distributed database system consists of loosely coupled sites that share no physical components.

Based on this definition, we can conclude that EDB Postgres Distributed enables Postgres to be a distributed database management system. If we treat and use EDB Postgres Distributed in that manner, we stand to gain much from the unique qualities of distributed systems, which go beyond extremely high availability.

Key value pair storage architecture

The growing desire to have a distributed database management system has supported the growth of this technology for some time now. The largest part of this software was created for very specific use cases, like Google Spanner, initially developed to support Google Ads.

As distributed database technology gradually becomes more mainstream, creating solutions for the world's premier relational database—PostgreSQL—becomes desirable. Meanwhile we have actually seen a few of these database implementations with stronger or less strong PostgreSQL compatibility come available.

As a PostgreSQL person, I find that a major drawback is that these solutions might be PostgreSQL compatible, but in fact, they are not PostgreSQL. To be able to make the distributed capabilities of these solutions work, they rely on alternate storage or transaction management systems, including the implementation of a Key Value Pair (KVP) storage architecture.

PostgreSQL native

One of the coolest attributes of EDB Postgres Distributed is that it is actually a PostgreSQL extension. It leaves your tried, tested and battle-proven database system intact and simply adds distributed database capabilities on top. How cool is that?

The stronger PostgreSQL becomes, the more capabilities your distributed system will have. Whether your goal is to run a geographically distributed installation, or you are looking to build a killer application on state-of-the-art microservices technology, including Cloud Native Postgres on Kubernetes, it all just works. Better yet, it will keep on working. In 2022 full support for EDB Postgres Distributed will also become available with Cloud Native Postgres.

Conclusion

Distributed databases are coming. They will be here in multiple forms. For Postgres, this is nothing new. We have seen technology fork from PostgreSQL in different shapes and sizes, only to be overtaken by the original in the end.

If you bet, bet on PostgreSQL, that is not different when it comes to using EDB Postgres Distributed. It is, in the end, just a new dawn on the horizon for PostgreSQL.

Want to learn more? Read our white paper, Postgres Distributed: The Next Generation of PostgreSQL High Availability!

In this Article

How it all began
Scaling PostgreSQL horizontally
The power of multi-master data replication
Extreme high availability
Sharding
PostgreSQL: a new dawn
Distributed database technology
Conclusion

Resource Feature Callout 1