De EDB-blog
September 11, 2019

Scalable Replication Tools

Scalable replication tools are a key component for Postgres data integration and migration. Compatible with data from Oracle, or SQL server, scalable replication tools can replicate data to EDB Postgres Advanced Server or the open source version of Postgres.

Why is replication important in the context of Postgres? For one, many organizations use replication for offloading. As an example, an application can be built to give a business their order status that leaves the core system of an application. Using Postgres, the order status data can be replicated and made available through a website.

Replication also plays a significant role in zero downtime migrations and upgrades. If you’re running systems in Australia and Europe, for instance, and want to reconcile their data with the data on another system based in the U.S., replication makes that possible.

Postgres Replication Capabilities

Postgres includes physical streaming replication capabilities, and starting in Postgres version 10, built-in logical replication is also supported. However, the built-in logical replication features have yet to mature enough to provide the same capabilities as is currently supported in EDB Postgres Replication Server (EPRS).

Through the sign over manager for high availability, minor updates can be acquired, which can deliver minor version changes without shoving down all of your old raw solution. But major upgrades still require members of a streaming replication cluster remain on the same binary version, meaning major upgrades can’t be carried out without system downtime in streaming applications.

Logical replication, which is implemented in EDB Postgres Replication Server, makes those capabilities possible. This solution drives extremely high availability, allowing to keep Postgres above and beyond what’s possible with standard tool replication and delivers and reach three-nines or even four-nines and above.

Postgres Scalable Replication Architecture

The architecture of Postgres’ scalable replication was designed to get around the limitation of a single point of failure around the XDB application server, as well as improve the replication scalability and meet the extended needs of data configuration requirements. A key component to manage data flow is the Kafka framework, a distributed, highly scalable meshing system that provides a fault tolerant mechanism to enable data flow across different geographies. Kafka offers the key functionality of providing highly optimized IO capabilities in terms of the zero copy, and storage format.

Another key function of the new architecture is the capability to offload the data load from the main production database. In the existing solution, if there is a requirement to add more nodes, the data load happens out of the source publication database. Under the Kafka framework, the data can be maintained at the Kafka level, leaving a replicated copy that can be maintained depending on the disc availability. The data can set with retention policy and remain as long as required.

AVRO Storage Format

The reel that Kafka provides is the use of AVRO as the storage format, which can provide optimal data transmission. AVRO is a very compact and fast binary format that provides optimal data storage and data flow without making use of compression, allowing compression to be applied on top of it. Use of AVRO eliminates the need to transmit schema definition as part of the message body and reduces the per-message payload to contain only data part. Hence on the consumer side, the schema is extracted from registry as the AVRO message is deserialized.

Configuration of the Replication Network

To facilitate the configuration of the replication network, lifecycle management is handled through the EPRS server and Kafka is used to provide the actual data replication. For example, in a three node cluster, there will be three different brokers engaged in a given replication network, and Kafka automatically makes sure that data is auto-replicated to other brokers as soon as a topic is written on the first broker. ZooKeeper is also used to provide node coordination, failure detection, and fail over capabilities. Through this architecture, there is no more a single point of failure.

Since Kafka maintains automatic replication, relevant topics are replicated across the whole cluster, where data will remain. Once the database is down and then data is being generated by the other database, that data will remain in the Kafka topics, and it will be available until a point defined by the retention policy. A high retention policy is recommended — the default value is seven days, but one can increase it depending on the specific workload and the use cases.

Access Additional Resources

Whether you desire high availability or offloading capabilities, scalable replication in Postgres is a valuable feature. To learn more, watch our webinar on the topic. And, for more on how to configure the EDB Postgres Advanced Server’s scalable replication capabilities, download a complete tutorial.

 

Zahid.Iqbal_enterprisedb.com's picture

Zahid Iqbal is VP of Replication and Migration Tools, and leads the development of EDB Postgres Replication Server. Zahid has been with EDB since 2004, the year the company was founded and been instrumental in the design and development of migration and replication solutions. Prior to EDB, Zahid...