5.8 Optimizing Performance

Table of Contents Previous Next


5 Single-Master Replication Operation : 5.8 Optimizing Performance

Note: Most of these configuration options are applicable to multi-master replication systems as well. Options applicable to multi-master replication systems are those that apply to the publication server and are not specific to a database product other than Postgres (such as an Oracle feature).
Note: The options described in this section apply to the publication server only and are set in the publication server configuration file unless otherwise specified.
When the copyViaDBLinkOra option is set to true, the Oracle database link API, dblink_ora, is used instead of JDBC COPY to populate Advanced Server subscription tables from an Oracle publication during snapshot replication.
Note: The Oracle database link API feature is not available with PostgreSQL, therefore the copyViaDBLinkOra option is not applicable to PostgreSQL subscription tables.
Note: Prior to using dblink_ora with xDB Replication Server, there are a number of required configuration steps that must be performed in Advanced Server. For Advanced Server versions 9.3 or earlier, see the readme text file, README-dblink_ora_setup.txt located in the POSTGRES_INSTALL_HOME/doc/contrib directory for directions. For Advanced Server versions 9.4 or later, see Chapter dblink_ora in the Database Compatibility for Oracle Developer’s Guide for directions.
Set the useFastCopy option to true to skip Write-Ahead Log (WAL) logging during COPY operations in order to optimize data transfer speed.
The archive_mode configuration parameter in the postgresql.conf file of the target Postgres database server must be off (thereby disabling archiving of WAL data) in order to use the useFastCopy option.
Use the cpBatchSize option to set the batch size (in Megabytes) that is used in the JDBC COPY operation during a snapshot. Increase the value of this option for large publication tables.
This option is influential when Postgres is the subscription database since the JDBC COPY operation is used to load Postgres subscription tables.
The batchSize option controls the number of INSERT statements in a JDBC batch.
For a Postgres subscription database, tables are loaded using JDBC COPY, however, if the COPY operation fails for some reason, then table loading is retried using JDBC batches of INSERT statements as in the case of Oracle and SQL Server.
Set the skipAnalyze option to true if you want to skip execution of the ANALYZE command after loading Postgres subscription tables. The ANALYZE command gathers statistical information on the table contents. These statistics are used by the query planner.
Note: To apply this option to a single-master replication system, it must be set for the subscription server within the subscription server configuration file. To apply this option to a multi-master replication system, it must be set for the publication server within the publication server configuration file.
The snapshotParallelLoadCount option controls the number of threads used to perform snapshot data replication in parallel mode. The default behavior is to use a single thread. However, if the target system architecture contains multi-CPUs/cores you can specify a value greater than 1, normally equal to the CPU/core count, to fully utilize the system resources.
If a table contains a column with a data type typically used for large objects such as BYTEA, BLOB, or CLOB, there is a greater possibility that a heap space error may occur because of a potentially large amount of data (hundreds of megabytes) brought into memory. In order to minimize the possibility of this error, a snapshot replication loads tables containing a large object data type, one row at a time using a single INSERT statement per batch.
Note: The options described in this section apply to the publication server only and are set in the publication server configuration file.
When synchronization replication occurs, the changes recorded in the shadow tables are applied to the subscription tables in JDBC batch updates. Within each batch, changes may be applied using either an individual SQL statement for each change; or a set of changes may be applied using a single, prepared SQL statement. A prepared SQL statement is parsed and compiled only once, but it can be executed multiple times using different values for certain components of the SQL statement in each execution. A SQL statement that is not prepared is parsed, compiled, and executed only once.
Prepared statements are useful only if the same type of SQL statement (INSERT, UPDATE or DELETE) is executed repeatedly and consecutively with the same target table, but with different values. If there is a sequence of consecutive changes that occur to the same table using the same operation such as inserting a set of rows into the same table populating the same columns, the publication server may apply these changes using a prepared statement. Otherwise, each change is applied with its own individual SQL statement.
The defaultBatchUpdateMode option controls whether the default mode is to use individual SQL statements in the JDBC batch update (this mode of operation is referred to as BUS) or to use prepared SQL statements in the JDBC batch update (this mode of operation is referred to as BUP).
The switchBatchUpdateMode option controls whether or not the publication server dynamically switches between BUS mode and BUP mode during the replication process depending upon the type and sequence of updates it encounters in the shadow tables for the trigger-based method or the changeset stream for the log-based method.
This means using the default settings of defaultBatchUpdateMode=BUS and switchBatchUpdateMode=true, the publication server starts out by applying updates with individual SQL statements. When it encounters a stream of consecutive changes that can all be processed in a single prepared statement, it will switch to using prepared SQL statements.
Note: If you want a certain batch update mode used throughout all synchronization replications applied by a given publication server without switching update modes, set the defaultBatchUpdateMode option to the desired mode in combination with switchBatchUpdateMode=false. For example, if you only want prepared statements used, set the following options:
Note: When Oracle is the subscription database, synchronization replication always occurs in BUP mode as if the preceding two options were always set. The reason for this is so large columns of TEXT data type from Postgres publications can successfully replicate to Oracle CLOB columns. In BUS mode an individual Oracle SQL statement has a string literal maximum length of 4000 characters. This limitation does not occur for prepared SQL statements that are used in BUP mode.
The busBatchThresholdCount option sets the number of consecutive updates of the same type that must be encountered in the shadow tables for the trigger-based method or the changeset stream for the log-based method before the publication server switches from BUS mode to BUP mode if dynamic switching is permitted (that is switchBatchUpdateMode=true).
If changes to the publication were made using many SQL statements where each statement affected more than one row, then it may be beneficial to lower busBatchThresholdCount to encourage the use of prepared statements on the multiple shadow table rows resulting from each individual change on the publication.
The bupBatchThresholdCount option is used in combination with the bupBatchThresholdRepeatLimit option to control the frequency of mode switches based on the volatility of expected update types to the publication.
Each time the same prepared SQL statement is consecutively executed, an internal “batch” counter is incremented. If this batch count falls below bupBatchThresholdCount for the number of executions of a given prepared statement, then a second internal “repeat” counter is incremented by one. If the repeat counter eventually reaches bupBatchThresholdRepeatLimit, the update mode is switched from BUP to BUS.
Thus, if there are frequent, consecutive changes of prepared SQL statements (as measured against bupBatchThresholdRepeatLimit), each of which is executed a small number of times (as measured against bupBatchThresholdCount), then the mode of execution changes back to individual SQL statements instead of prepared statements.
Note: The publication server changes back to prepared statements when the threshold set by busBatchThresholdCount is met.
The following example illustrates the processing of updates when bupBatchThresholdCount is set to 3 and bupBatchThresholdRepeatLimit is set to 4. A change to the “query domain” referred to in this example means a different statement type (INSERT, UPDATE, or DELETE) or a different target table are encountered in the next update, thus requiring the use of a different prepared SQL statement.
At this point the query domain is changed after the first two updates (change from table emp to dept) and the number of executions of the prior prepared statement (2) is less than bupBatchThresholdCount, so the repeat counter is set to 1.
The query domain is changed again (change from table dept to emp), but this time the number of executions (4) for the same query domain (updates 3 thru 6) exceeds bupBatchThresholdCount so the repeat counter is reset to 0.
The query domain is changed again (INSERT statement to UPDATE statement) and the number of executions (2) is less than bupBatchThresholdCount, so the repeat counter is incremented to 1.
Parallel synchronization takes advantage of multi-CPUs or cores in the system architecture by using multiple threads to apply transaction sets in parallel.
The syncLoadThreadLimit option controls the maximum number of threads used to load data from source publication tables during parallel synchronization. The default count is 4. However, depending on the target system architecture (specifically, multi-CPUs/cores) you can choose to specify a custom count, normally equal to the CPU/core count, to fully utilize the system resources.
The dataSyncThreadCount option controls the maximum number of threads used to apply incremental changes during synchronization replication to the target slave databases (for single-master replication systems) or to the target master nodes (for multi-master replication systems) in parallel mode. The default behavior (when dataSyncThreadCount is set to 0) is to use as many threads as there are target nodes.
The targetDBQueryTimeout option controls the timeout interval (in milliseconds) before an attempt by the publication server to apply a transaction set on a target database is aborted by the database server (typically due to a lock acquired by another application on one or more of the target tables).
The targetDBQueryTimeout option sets the default lock timeout value to 10 minutes. Change the 10 minute default value to a higher value if you want to allow a longer wait time before the transaction is aborted. Change the value to 0 if you want to turn off usage of the targetDBQueryTimeout option in which case the timeout interval is controlled by the setting of the Postgres database server statement_timeout configuration parameter.
A higher value of targetDBQueryTimeout delays processing of subsequent transaction sets on other target databases because if a transaction set is blocked, the next transaction set cannot be loaded until: 1) the lock is released and the blocked transaction set can then be applied to completion, or 2) the targetDBQueryTimeout interval is exceeded.
The syncBatchSize option controls the number of statements in a synchronization replication JDBC batch.
The syncFetchSize option controls how many rows are fetched from the publication database in one network round-trip. For example, if there are 1000 pending row changes, the default fetch size requires 5 database round-trips. Using a fetch size of 500 retrieves all changes in 2 round trips.
The txSetMaxSize option defines the maximum number of transactional rows that can be grouped in a single transaction set. The publication server loads and processes the changes by fetching as many rows in memory as grouped in a single transaction set.
Set enablePerformanceStats option to true only if you need to conduct performance testing and analyze the replication statistics. When enabled, the publication server creates additional triggers on the publication tables in each master node. The triggers produce transaction statistics that are recorded in the mmr_transaction_history table in the control schema. This option should be disabled in a production environment to avoid performance overhead.

5 Single-Master Replication Operation : 5.8 Optimizing Performance

Table of Contents Previous Next