Postgres Plus xDB Replication Server with Multi-Master User's Guide : 2.4 Designing a Replication System
Postgres Plus Advanced Server xDB Replication Server User's Guide
2.4 Designing a Replication System
This section presents the general steps, design considerations, and best practices for designing a replication system before you begin the actual implementation.
The following steps provide a general guideline for implementing a replication system.
Step 1: Determine if xDB Replication Server is the right solution for your requirements and you have chosen the best solution for your particular needs. xDB Replication Server can be used to implement single-master or multi-master replication systems. For single-master replication systems, the distinguishing characteristic of xDB Replication Server is its ability to replicate from an Oracle database to a PostgreSQL or Advanced Server database, from a SQL Server database to a PostgreSQL or Advanced Server database, from a Postgres Plus Advanced Server database to an Oracle database, or from a PostgreSQL or Advanced Server database to a SQL Server database.
Step 2: Plan the general strategy of how you will use xDB Replication Server. Will the single-master or multi-master model best suit your needs? (See Section 2.1 for use case examples of single-master and multi-master replication systems.) Will you be replicating from Oracle to Postgres, from SQL Server to Postgres, from Advanced Server to Oracle, or from Postgres to SQL Server? Will you be replicating between PostgreSQL and/or Advanced Server databases? How often will you need to replicate the data? Will replication be done on an ad hoc basis or does it need to occur regularly according to a schedule?
Step 3: Plan the logistics of your replication system. How many tables do you expect to replicate and what are their sizes in total number of bytes and number of rows? What percentage of rows do you expect to have been changed on each table between each replication? Are your database servers required to run on dedicated machines?
Step 4: Design your replication system. Determine whether your replication system will be distributed or will run on a single host. Determine the publications and subscriptions you will need and their tables and views. Make sure your publication tables meet the requirements for an xDB Replication Server publication. See sections 2.4.2 and 2.4.3 for details.
Step 5: Implement and test your replication system in a test environment. Try out your replication system on a subset of your publication data to ensure the replication process works as expected. Make sure the resulting replicated tables can be used as expected in your application. Establish preliminary metrics on how long the replication process will be expected to take in your full production environment.
Step 6: Implement and test your replication system in your production environment.
Keep the following points in mind when designing a replication system:
● Multi-master replication is supported only on Postgres databases. In addition, Advanced Server databases must be running in the same compatibility mode – either all Oracle or all PostgreSQL.
● An Oracle table can be a member of at most one publication if all publications are subordinate to one publication database definition. However, an Oracle table can be a member of multiple publications if each publication is subordinate to a different publication database definition.
● A Postgres table can be a member of at most one publication.
● Each table used in a publication must have a primary key with the exception of tables in snapshot-only publications, which do not require a primary key.
● Make sure table definitions are well established before creating publications. Unless the DDL change replication feature is used as described in Section 7.6, if a table definition is changed, any publication containing the table along with its associated subscription must be deleted and recreated, otherwise replication may fail. The same applies for the table definitions in a master definition node and its associated master nodes. Replication failures can be seen in the replication history.
● Views can be members of snapshot-only publications. In the subscription database, a view is replicated as a table.
● A publication may have multiple subscriptions.
● A subscription can be associated with at most one publication.
● A database can contain both publications and subscriptions.
● A given publication server can support only one multi-master replication system. All master nodes created subordinate to a given publication server are assumed to be part of the same multi-master replication system.
● A table that is created as a result of a subscription can be used in another publication. Thus, a publication can replicate data to a subscription which in turn, can be used in a publication to replicate to another subscription, thus creating a cascaded replication architecture.
● There are restrictions on the combinations and configurations of database servers that can be used for a publication and its subscription. See Section 9.1 for details on these restrictions.
● All replication system components must be running in order for replication to occur, or before performing any configuration, operation, or modification in the replication system. (The xDB Replication Console is used for the configuration and modification of a replication system. The xDB Replication Console does not need to be running in order for replication to occur.)
● In general, the order of creation of a replication system is as follows: 1) Create the required physical databases, database user names, tables, and views to be used in the replication system. 2) Define the replication system logical components using the xDB Replication Console or xDB Replication Server CLI. 3) Perform replication.
● In general, the order of removal of a single-master replication system is as follows: 1) Remove the replication system logical components using the xDB Replication Console or xDB Replication Server CLI starting with the subscriptions (Subscription nodes) and then their parent components (Subscription Database nodes). 2) Unregister the subscription server if you no longer have any need for it. 3) Repeat the same process for the publications. 4) After all replication system logical components have been removed (except for possibly the publication server and subscription server) you can drop any of the physical database objects in Oracle, SQL Server, or Postgres. Do not drop the metadata database objects manually, for example by using an SQL command line utility. Doing so may cause the xDB Replication Console and xDB Replication Server CLI to become inoperable. (See Section 220.127.116.11 if this problem occurs.) Deleting the replication system logical components using the xDB Replication Console or xDB Replication Server CLI automatically drops the metadata database objects from the physical database.
● The order of removal of a multi-master replication system is as follows: 1) Remove the replication system logical components using the xDB Replication Console or xDB Replication Server CLI starting with the publication database definitions of the master nodes other than the master definition node. 2) Remove the publication from under the master definition node. 3) Remove the publication database definition of the master definition node. 4) After all replication system logical components have been removed (except for possibly the publication server) you can drop any of the physical database objects in Postgres. Do not drop the metadata database objects manually, for example by using an SQL command line utility. Doing so may cause the xDB Replication Console and xDB Replication Server CLI to become inoperable.
When a subscription is created in a single-master replication system, the table definitions and most database objects and attributes associated with the publication tables are created in the subscription database by the subscription server.
If you so choose, the same process can automatically occur when a master node is added to a multi-master replication system. The table definitions and most database objects and attributes associated with the publication tables can be created in the newly added master node by the publication server.
The following is a list of database objects and table attributes that are replicated from the publication in either a single-master or multi-master replication system.
● Views (for snapshot-only publications) created as tables in the subscription database
● Primary keys
● Not null constraints
● Unique constraints
● Check constraints
Note: Foreign key constraints are not replicated by the publication or subscription server in a single-master replication system. However, in a multi-master replication system, foreign key constraints are replicated from the master definition node to other master nodes.
xDB Replication Server does have some restrictions on the types of tables it can replicate.
Certain types of Oracle partitioned tables can be replicated. See Section 18.104.22.168 for details.
Oracle tables that include the following data types cannot be replicated:
Oracle tables with the following data types can be used in snapshot-only publications, but cannot be used in synchronization replications:
● LONG RAW
SQL Server tables that include the following data types cannot be replicated:
Note: See Section 9.4.6 for a method to replicate tables containing the SQL_VARIANT data type under certain conditions.
SQL Server tables with the following data types can be used in snapshot-only publications, but cannot be used in synchronization replications:
Postgres tables with the line geometric data type cannot be replicated.
This section discusses the processing differences between snapshot and synchronization replications and provides some general guidelines on which method to use.
22.214.171.124 What Happens During Snapshot Replication
In snapshot replication, the target tables are completely reloaded from the source tables. The database system’s truncate operation is used to delete all rows from the target tables.
For Oracle and SQL Server only: Oracle and SQL Server target tables are loaded using JDBC batches of INSERT statements.
For Postgres only: In general, Postgres target tables are loaded using the JDBC COPY command since using truncation and COPY is generally faster than if you were to execute an SQL DELETE statement against the entire table and then add the rows using JDBC batches of INSERT statements. If the COPY command fails, the publication server retries the snapshot using JDBC batches of INSERT statements.
If the target table (regardless of database type) contains a large object data type such as BYTEA, BLOB, or CLOB then rows are loaded one at a time per batch using an INSERT statement. This is to avoid a heap space error resulting from potentially large rows. Loading time can be decreased by allowing multiple inserts per batch, which is done by adjusting the configuration option lobBatchSize described in Section 5.8.1.
Note: Advanced Server supports a number of aliases for data types. Such aliases that translate to BYTEA are treated as large object data types. See the Oracle Compatibility Developer’s Guide for a listing of Advanced Server data types.
Under certain circumstances, the corresponding Postgres target table created for certain types of Oracle partitioned tables is a set of inherited tables. In these cases, the SQL DELETE statement is used on the inherited child tables instead of truncation. See Section 126.96.36.199 for additional information on replicating Oracle partitioned tables.
A server configuration option is available that forces the snapshot replication process to use the Oracle database link utility instead of JDBC COPY to populate the Postgres target tables from an Oracle publication. Oracle database link provides an additional performance improvement over JDBC COPY. See Section 5.8.1 for information on using the Oracle database link option.
See Section 5.8.1 for information on various configuration options to optimize snapshot replication.
In synchronization replication, changes made to the source tables since the last replication occurred are applied to the target tables using SQL INSERT, UPDATE, and DELETE statements. However, the SQL statements run against the target tables are not the same SQL statements that were run against the source tables.
If a publication in a single-master replication system is created that will be used in synchronization replications, the publication server installs an insert trigger, an update trigger, and a delete trigger on each publication table. In a multi-master replication system, each replicated table in every master node has an insert trigger, an update trigger, and a delete trigger.
The publication server also creates a shadow table for each source table. A shadow table is a table used by xDB Replication Server to record the changes (inserts, updates, and deletions) made to a given source table. A shadow table records three types of record images: For each row inserted into the source table, the shadow table records the image of the inserted row. For each existing row that is updated in the source table, the shadow table records the after image of the updated row. For each row deleted from the source table, the shadow table records the primary key value of the deleted row.
Note: In a multi-master replication system, the before image of an updated row is also stored in the shadow table in order to perform update conflict detection. See Section 6.6 for information on conflict detection in a multi-master replication system.
After each change on the source table, one of the insert, update, or delete triggers is executed. These are row triggers, so for each row affected by the change, the trigger executes. Each execution of the trigger records a row of the appropriate type (insert, update, or deletion) in the shadow table of the corresponding source table.
When synchronization replication occurs, the publication server executes JDBC batches of SQL statements (also referred to as transaction sets) against the target tables. The batches contain an INSERT statement for each shadow table row recording an insert operation, an UPDATE statement for each shadow table row recording an update operation, and a DELETE statement for each shadow table row recording a delete operation. Each batch is executed in one transaction.
Shadow table rows that were applied to target tables can be viewed as shadow table history in the xDB Replication Console (see Section 7.3.3).
Note: A single SQL statement executed against a source table may result in many rows recorded in a shadow table, and therefore, many SQL statements executed against the target table. For example, if a single UPDATE statement affects 10 rows in the source table, 10 rows will be inserted into the shadow table – one for each row in the source table that was updated. When the publication server applies the changes to the target table, 10 UPDATE statements will be executed.
Note: For greater efficiency, when changes to the source tables consist of SQL statements that each affect a large number of rows, the publication server may employ the use of prepared SQL statements. See Section 5.8.2 for directions on how to control the usage of prepared SQL statements as well as information on various other configuration options to optimize synchronization replication.
188.8.131.52 When to Use Snapshot or Synchronization
Generally, synchronization would be the quickest replication method since it only applies changes to the target tables since the last replication occurred.
However, if a large percentage of rows are changed between each replication, there may be a point where it would be faster to completely reload the target tables using a snapshot than to execute individual SQL statements on a large percentage of rows as would be done for synchronization replication. Experimentation may be necessary to determine if, and at what point a snapshot would be faster.
Snapshot replication may be an option for tables with the following characteristics:
● Tables are relatively small in size
● A large percentage of rows are changed between replications
Synchronization replication is the better option for tables with the following characteristics:
● Tables are large in size
● A small percentage of rows are changed between replications
In a single-master replication system, if you find that one group of tables consistently replicates faster using snapshot replication, then these tables can be made part of a snapshot-only publication while the remaining tables can be members of a publication that uses synchronization replication.
184.108.40.206 When to Use On Demand Replication
The xDB Replication Console and xDB Replication Server CLI both give you the capability to immediately start a replication. This is called an on demand replication.
On demand replication can be performed at any time regardless of whether or not there is an existing schedule. An on demand replication does not change the date and time when the next replication is scheduled to occur according to an existing schedule.
If a publication is a snapshot-only publication, then the only type of on demand replication that can be performed on this publication is a snapshot.
If a publication is not a snapshot-only publication, you can perform an on demand replication using either the snapshot method or the synchronization method.
When you are in the development and testing phases of your replication system, you would typically use on demand replication so that you can immediately force the replication to occur and analyze the results.
When your replication system is ready for production, a schedule would typically be used so that replications can occur unattended at regular time intervals. See Section 7.1 for directions on creating a schedule.
There may be other situations where you would want to force a replication to take place ahead of its normal schedule. Reasons for performing an on demand replication may include the following:
● The number of changes to the source tables is growing at a faster rate than usual, and you do not want to wait for the regularly scheduled synchronization time to replicate all of the accumulated changes.
● You have set up your replication system to perform synchronizations, but on this occasion there have been an unusually large number of changes made to the source tables, and you would rather perform a snapshot of all source tables rather than execute a large number of SQL statements against the target tables.
● Changes have been made directly to the rows of the target tables so that they no longer have the same content as their source table counterparts. You can perform an on demand snapshot replication to reload all rows of the target tables from your current set of source tables.
Note: In a multi-master replication system, on demand snapshots can only be made from the master definition node to another master node.
xDB Replication Server provides the flexibility of allowing you to run the replication system’s components on separate machines on a network.
In fact xDB Replication Server is designed so that it is possible to set up replication systems where each of the components (publication server, subscription server, xDB Control database, publication database, subscription database, and master nodes) may all run on the same host, each component may run on its own separate host, or any combination of components may run on any number of hosts.
However, for practical purposes, there are two basic scenarios. The simplest case is where all components are on the same host. The other case is where you have the Oracle or SQL Server database server running on a host separate from the rest of the replication system components.
This section discusses the advantages and disadvantages of each scenario.
220.127.116.11 Single Host
The simplest implementation of a replication system is when all replication components run on a single host. This means that the PostgreSQL or Advanced Server installation, the complete xDB Replication Server installation (publication server, subscription server, and xDB Control database), and the Oracle or SQL Server database server reside on the same machine.
The Postgres publication or subscription database as the case may be, can reside in the initial database cluster that is created when Postgres is installed on the host.
Figure 21 - Single host replication system
The advantages of a single host replication system are the following:
● There is a performance advantage since there is no network over which to push replication data, especially if large snapshots are involved.
● Configuration is much simpler. When creating the replication system logical components, the IP addresses of all components are the same.
The disadvantages of a single host replication system are the following:
● The replication system and the database servers all consume the resources of one machine, which can adversely affect database application performance.
● The publication and subscription databases may be in different geographic locations, thereby requiring multiple networked hosts.
● Your site may require the use of a dedicated host for the Oracle or SQL Server database server so xDB Replication Server could not reside on the same machine.
18.104.22.168 Single-Master Replication Distributed Hosts
xDB Replication Server allows you to build a replication system with either or both of the publication database and the subscription database on separate hosts. This is illustrated in the following diagram:
Figure 22 - Oracle database server on distributed host
The same remote distribution can be used for the subscription database instead of, or in addition to the publication database.
The advantages of a distributed host replication system are the following:
● The replication system and the database servers can each consume the resources of their own machines, which can be individually selected and tuned.
● The publication and subscription databases can be in different geographic locations.
● You can enforce stronger database security if only the database server is allowed to run on a host.
The disadvantages of a distributed host replication system are the following:
● There may be a performance disadvantage since there is a network over which to push replication data, especially if large snapshots are involved.
● Installation is more complex if the Postgres database must run on a different host than xDB Replication Server. This involves installing Postgres on two separate hosts.
● Configuration is more complex. The network and firewalls must be properly configured to allow the distributed components to communicate. When creating the replication system logical components, the correct IP addresses of all components must be used. In addition, the correct IP addresses must be kept up-to-date in the replication system metadata should they change in the networked environment.
22.214.171.124 Multi-Master Replication Distributed Hosts
In a multi-master replication system, the Postgres database servers running the master nodes can be running on a single or multiple hosts. The following example illustrates two master nodes running on database servers on separate hosts as well as a master node running on the same database server as the publication server.
Figure 23 - Multi-master replication on distributed hosts