Settings v1

General Settings

  • logical_replication_mode: Affects how the program interprets connections and table filter settings (see more details below), and also what requirements to check for in the connections before starting the comparison. Currently the possible values are:

    - `off`: Assumes there is no logical replication between the databases;
    
    - `native`: Assumes there is native logical replication between the
    databases. Enables the usage of the `Table Filter -> publications`
    setting to specify the list of tables to be used. Requires PostgreSQL 10+ on
    all databases.
    
    - `pglogical`: Assumes there is pglogical replication between the databases.
    Enables the usage of the `Table Filter -> replication_sets` setting to
    specify the list of tables to be used. Also enables the usage of `node_name`
    to specify the data connections, which require setting the `Initial
    Connection` that is used to retrieve DSN information of the nodes. Requires
    the `pglogical` extensions to be installed on all databases.
    
    - `bdr`: Assumes all data connections are nodes from the same BDR cluster.
    Enables usage of `Table Filter -> replication_sets` setting to specify list
    of tables to be used. Also enables usage of `node_name` to
    specify the data connections, which require setting `Initial Connection`
    that is used to retrieve DSN information of the nodes. Requires `pglogical`
    and `bdr` extensions installed on all databases.
  • all_bdr_nodes: If logical_replication_mode is set to bdr, then it is possible to specify only the Initial Connection (see below) and let LiveCompare build the connection list based on the current list of active BDR nodes. Default: off.

  • max_parallel_workers: Number of parallel processes to be considered. Each process will work on a table from the queue. Default: 2.

Important: Each process will keep N+1 open connections: 1 to each data connection and another 1 to the output database.

  • buffer_size: Number of rows to be retrieved from the tables on every data fetch operation. Default: 4096.

  • log_level: Verbosity level in the log file. Possible values: debug, info, warning or error. Default: info.

  • data_fetch_mode: Affects how LiveCompare fetches data from the database.

    • prepared_statements: Uses prepared statements (a query with LIMIT) for data fetch. Only a very small amount of data (buffer_size = 4096 rows by default) is fetched each time, so it has the smallest impact of all 3 modes, and by the same reason it's the safer fetch mode. Allows asynchronous data fetch (defined by parallel_data_fetch). For the general use case, this fetch method provides a good performance, but a performance decrease can be felt for large tables. This is the default and strongly recommended when server load is medium-high.

    • server_side_cursors_with_hold: Uses server-side cursors WITH HOLD for data fetch. As table data is retrieved in a single transaction, it holds back xmin and can cause bloat and replication issues, and also prevent VACUUM from running well. Also, the WITH HOLD clause tells Postgres to materialize the query (workers may hang for a few seconds waiting for the data to be materialized), so the whole table data consumes RAM and can be stored on Postgres side disk as temporary files. All that impact can be decreased by using parallel_chunk_rows (set to 10000000 by default), and speed can be improved by increasing buffer_size a little. Allows asynchronous data fetch (defined by parallel_data_fetch). For the general use case, this fetch method doesn't provide any benefits when compared to prepared_stataments, but for multiple small tables it's faster. However, this mode is recommended only when load is very low, for example on tests and migration scenarios.

    • server_side_cursors_without_hold: Uses server-side cursors WITHOUT HOLD for data fetch. As server_side_cursors_with_hold, this mode can also hold back xmin, thus potentially can cause bloat, VACUUM and replication issues on Postgres, but such impact is higher because WITHOUT HOLD cursors require an open transaction for the whole comparison session (this will be lifted in further versions). As the snapshot is held for the whole comparison session, comparison results might be helpful depending on your use case. As the query is not materialized, memory usage and temp file generation remains low. Asynchronous data fetch is not allowed. In terms of performance, this mode is slower for the general use case, but for large tables it can be the faster. It's recommended when load on the database is low-medium.

Important: the choice of the right data_fetch_mode for the right scenario is very important. Using prepared statements has the smallest footprint on the database server, so it's the safest approach, and it's good for the general use case. Another point is that prepared statements allow LiveCompare to always see the latest version of the rows, which may not happen when using server-side cursors on a busy database. So it's recommended to use prepared_statements for production, high load servers; and either server_side_cursors_* settings for testing, migration scenarios, and low load servers. The best strategy would probably mix server_side_cursors_without_hold for very large tables, and prepared_statements for the remaining tables. Refer to the table below for a comparison on the cost/benefit ratio:

prepared_statementsserver_side_cursors_with_holdserver_side_cursors_without_hold
xmin holdvery lowmediumhigh
xmin released perbufferchunkwhole comparison session
temp filesvery lowvery highlow
memoryvery lowhighlow
allows async connsyesyesno
fastest forgeneralsmall tableslarge tables
recommended loadhighvery lowlow-medium

Note about Oracle: For Oracle, the data_fetch_mode setting is completely ignored, and data will always be fetch from Oracle using direct queries with LIMIT, without using prepared statements or cursors.

  • parallel_chunk_rows: Minimum number of rows required to consider splitting a table into multiple chunks for parallel comparison. A hash is used to fetch data, so workers don't clash with each other. Each table chunk will have no more than parallel_chunk_rows rows. Setting it to any value < 1 disables table splitting. If any connections are not PostgreSQL, then table splitting is disabled automatically by LiveCompare. Default: 10000000.

  • parallel_data_fetch: If data fetch should be performed in parallel (i.e., using async connections to the databases). Improves performance of multi-way comparison. If any data connections are not PostgreSQL, then this setting is automatically disabled. It's only allowed when data_fetch_mode = prepared_statements or data_fetch_mode = server_side_cursors_with_hold. Default: on.

  • comparison_algorithm: Affects how LiveCompare works through table rows to compare data. Using hashes is faster than full row comparison. It can assume one of the following values:

    - `full_row`: Disables row comparison using hashes. Full comparison, in this
    case, is  performed by comparing the row column by column.
    
    - `row_hash`: Enables row comparison using hashes and enables table
    splitting. Tables are split so each worker compares a maximum of
    `parallel_chunk_rows` per table. Data row is hashed in PostgreSQL, so the
    comparison is faster than `full_row`. However, if for a specific row the
    hash does not match, then for that specific row, LiveCompare will fallback
    to `full_row` algorithm (i.e., compare row by row). If any data connections
    is not PostgreSQL, then LiveCompare uses a row hash that's defined as the MD5
    hash of the concatenated column values of the row, being considered a
    "common hash" among the database technologies being compared.
    
    - `block_hash`: Works the same as `row_hash`, but instead of comparing row
    by row, LiveCompare builds a "block hash", i.e., a hash of the hashes of all
    rows in the data buffer that was just fetched (maximum of `buffer_size`
    rows). Conceptually it works like a 2-level Merkle Tree. If the block hash
    matches, then LiveCompare advances the whole block (this is why this
    comparison algorithm is faster than `row_hash`). If block hash does not
    match, then LiveCompare falls back to `row_hash` and performs comparison row
    by row in the buffer to find the divergent rows. This is the default value.
  • min_time_between_round_saves: Time in seconds to wait before updating each round state when the comparison algorithm is in progress. Note that when the round finishes, LiveCompare always updates the round state for that table. Default: 60 seconds.

Important: If the user cancels execution of LiveCompare by hitting Ctrl-c and starts it again, then LiveCompare will resume the round for that table, starting from the point where the round state was saved.

  • stop_after_time: Time in seconds after which LiveCompare will automatically stop itself as if the user had hit Ctrl-c. The comparison session that was interrupted, if not finished yet, can be resumed again by passing the session ID as argument in the command line. Default is stop_after_time = 0, which means that automatic interruption is disabled.

  • consensus_mode: Consensus algorithm used by LiveCompare to determine which data connections are divergent. Possible values are simple_majority, quorum_based or source_of_truth. If consensus_mode = source_of_truth then difference_sources_of_truth must be filled. Default is simple_majority.

  • difference_required_quorum: If consensus_mode = quorum_based, then this setting specified the minimum quorum is required to decide which connections are divergent. Should be a number between 0.0 and 1.0 (0.0 means no connection is required while 1.0 means all connections are required, both cases are extreme and should not be used). The default value is 0.5, and we recommend using a value close to that.

  • difference_sources_of_truth: Comma-separated list of connections names (or node names, if logical_replication_mode = bdr and all_bdr_nodes = on) that should be considered as source of truth. It is only used when consensus_mode = source_of_truth. For example: difference_sources_of_truth = node1,node2. In this example, either the sections node1 Connection and node2 Connection should be defined in the .ini file or all_bdr_nodes = on and only the Initial Connection is defined, while node1 and node2 should be valid BDR node names.

  • difference_tie_breakers: Comma-separated list of connection names (or node names, if logical_replication_mode = bdr and all_bdr_nodes = on) that should be considered as tie breakers whenever the consensus algorithm finds a tie situation. For example: difference_tie_breakers = node1,node2. In this example, either the sections node1 Connection and node2 Connections should be defined in the .ini file or all_bdr_nodes = on and only the Initial Connection is defined, while node1 and node2 should be valid BDR node names. Default is to not consider any connection as tie breaker.

  • difference_statements: Controls what kind of DML statements will be generated by LiveCompare. The value of difference_statements can be one of:

    - `all` (default)
    - `inserts`
    - `updates`
    - `deletes`
    - `inserts_updates`
    - `inserts_deletes`
    - `updates_deletes`
  • difference_allow_null_updates: Determines whether commands like UPDATE SET col = NULL will be allowed in the difference report. Default: on.

  • difference_statement_order: Controls order of DML statements that will be generated by LiveCompare. The value of difference_statement_order can be one of:

    - `delete_insert_update`
    - `delete_update_insert` (default)
    - `insert_update_delete`
    - `insert_delete_update`
    - `update_insert_delete`
    - `update_delete_insert`
  • difference_fix_replication_origin: When working with BDR databases, for difference LiveCompare will create a specific replication origin if it doesn't exist yet, then use the replication origin to create apply script with DML fixes. The setting difference_fix_replication_origin specifies the name of the replication origin used by LiveCompare. If the user doesn't set any value for this setting, then LiveCompare will automatically set difference_fix_replication_origin = bdr_local_only_origin. Note that the replication origin that LiveCompare creates is not dropped to allow verification after the comparison, but if needed the replication origin can be manually dropped later. Requires logical_replication_mode = bdr.

IMPORTANT: Please note that BDR 3.6.18 introduced the new pre-created bdr_local_only_origin replication origin to be used for applying local-only transactions. So if LiveCompare is connected to BDR 3.6.18, it won't create this replication origin, and it is recommended that the user should not try to drop this replication origin.

  • difference_fix_start_query: Arbitrary query that is executed at the beginning of the apply script generated by LiveCompare. Additionally if a BDR comparison is being performed and the difference_fix_start_query is empty, then LiveCompare also automatically does the following:

    - If the divergent connection is BDR 3.6.7, add
    `SET LOCAL bdr.xact_replication = off;`
    - Add commands that setup transaction to use the replication origin
    specified in `difference_fix_replication_origin`.
  • show_progress_bars: Determines whether or not progress bars should be shown in the console output. Disabling this setting might be useful for batch executions. Default: on.

  • output_schema: In the output connection, the schema where the comparison report tables will be created. Default: livecompare.

  • hash_column_name: Every data fetch will contain a specific column which is the hash of all actual columns in the row. This setting specifies the name of this column. Default: livecompare_hash.

  • rownumber_column_name: Some fetches need to use the row_number() function value inside a query column. This setting specifies the name of this column. Default: livecompare_rownumber.

  • fetch_row_origin: When this setting is enabled, LiveCompare fetches the origin name for each divergent row, which might be useful for debugging purposes. Default: off. To be enabled, requires logical_replication_mode set to pglogical or bdr.

  • column_intersection: When this setting is enabled, for a given table that is being compared, LiveCompare will only work on the intersection of columns from the table on all connections, ignoring extra columns that might exist on any of the connections. When this setting is disabled, LiveCompare will check if columns are equivalent on the table on all connections, and abort the comparison of the table if there are any column mismatches. Default: off.

Important: If table has PK, then the PK columns are not allowed to be different, even if column_intersection = on.

  • oracle_ignore_unsortable: When enabled, tells LiveCompare to ignore columns with Oracle unsortable data types (BLOB, CLOB, NCLOB, BFILE) if column is not part of the table PK. If enabling this setting, it is recommended to also enable column_intersection.

  • oracle_user_tables_only: When enabled, tells LiveCompare to fetch table metadata only from the Oracle logged in user, which is faster because it reads, for example, from sys.user_tables and sys.user_tab_columns instead of sys.all_tables and sys.all_tab_columns. Default: off.

  • oracle_fetch_fk_metadata: When enabled, tells LiveCompare to fetch foreign key metadata, which can be a slow operation. Overrides the value of the setting fetch_fk_metadata on the Oracle connection. Default: off.

  • schema_qualified_table_names: Table names are treated as schema-qualified when this setting is enabled. Disabling it allows comparison of tables without using schema-qualified table names: on Oracle x Postgres comparisons, it requires also enabling oracle_user_tables_only, while on Postgres x Postgres, it allows for comparisons of tables that are under different schemas, even in the same database. Also, when schema_qualified_table_names is enabled, Table Filter -> tables, Row Filter and Column Filter allow table name without the schema name. Default: on.

  • force_collate: When set to a value other than off and to a valid collation name, forces the specified collation name in ORDER BY operations in all Postgres databases being compared. Useful when comparing Postgres databases with different collation or when comparing Oracle versus Postgres databases (in this case users should set force_collate = C). Will assume value C if comparing mixed technologies (like Oracle vs PostgreSQL) and no collation is specified. Default: off.

  • work_directory: path to the LiveCompare working directory. The session folder containing output files will be created in such directory. Default: . (current directory).

  • abort_on_setup_error: when enabled, if LiveCompare hits any error when trying to setup a table comparison round, the whole comparison session is aborted. Default: off.

Important: Setting abort_on_setup_error is only considered during compare mode. In recheck mode, LiveCompare always aborts at the first error in setup.

  • custom_dollar_quoting_delimiter: when LiveCompare finds differences, it will output the DML using dollar quoting on strings. The default behavior is create a random string to compose it. If you want by any means use a custom one, you can set this parameter as the delimiter to be used. You just need to set the constant, not the $ symbols around the constant. Defaults to off, which means LiveCompare will use a md5 hash of the word LiveCompare.

  • session_replication_role_replica: when enabled LiveCompare will use session_replication_role PostgreSQL setting as replica in the output apply scripts. That's useful if you want to prevent firing triggers and rules while applying DML in the nodes with divergences. Enabling it requires a PostgreSQL super user, otherwise will take no effect. Defaults to off.

  • split_updates: when enabled LiveCompare will split UPDATE divergences, i.e., instead of generating a UPDATE DML, it will generate corresponding DELETE and INSERT in the apply script. Defaults to off.

  • float_point_round: an integer to specify decimal digits that LiveCompare should round when comparing float point values coming from the database. Default is -1, which disables float point rounding.

Initial Connection

The initial connection is used only when logical_replication_mode is set to pglogical or bdr, and is used only when the program starts, to fetch DSN from node names, if the user has set data connections using only the node_name setting.

  • technology: RDBMS technology. Currently the only possible value is postgresql.
  • dsn: PostgreSQL connection string. If dsn is set, then host, port, dbname and user are ignored. The dsn setting can also have all other parameter key words allowed by libpq.
  • host: Server address. Leave empty to use the Unix socket connection.
  • port: Port. Default: 5432.
  • dbname: Database name. Default: postgres.
  • user: Database user. Default: postgres.
  • application_name. Application name. Can be used even if the user set dsn instead of all other connection information. Default: livecompare_initial.

Output Connection

The output connection specifies where LiveCompare will create the comparison report tables.

  • technology: RDBMS technology. Currently the only possible value is postgresql.
  • dsn: PostgreSQL connection string. If dsn is set, then host, port, dbname and user are ignored. The dsn setting can also have all other parameter key words allowed by libpq.
  • host: Server address. Leave empty to use the Unix socket connection.
  • port: Port. Default: 5432.
  • dbname: Database name. Default: postgres.
  • user: Database user. Default: postgres.
  • application_name. Application name. Can be used even if the user set dsn instead of all other connection information. Default: livecompare_output.

Data Connection

A "data connection" is a connection section similar to the Initial Connection and the Output Connection, but LiveCompare effectively fetches and compares data on the data connections.

Similarly to the Initial Connection and Output Connection, a "data connection" is defined in a named section. The section name should be of the form Name Connection, being Name any single-worded string starting with an alphabetic character. In this case, whatever the user fills in Name is called the "Connection ID" of the data connection. It is also required that each data connection has an unique Connection ID in the whole list of data connections.

If logical_replication_mode = bdr and all_bdr_nodes = on, then the user is not required to specify any data connection, because LiveCompare will build the data connection list by fetching BDR metadata from the Initial Connection.

  • technology: RDBMS technology. Currently possible values are postgresql or oracle.
  • node_name: Name of the node in the cluster. Requires logical_replication_mode set to pglogical or bdr, and also requires that the Initial Connection is filled. If node_name is set, then dsn, hostport, dbname and user settings are all ignored.
  • dsn: PostgreSQL connection string. If dsn is set, then host, port, dbname and user are ignored. The dsn setting can also have all other parameter key words allowed by libpq.
  • host: Server address. Leave empty to use the Unix socket connection.
  • port: Port. Default: 5432.
  • dbname: Database name. Default: postgres.
  • service: Service name, used in Oracle connections. Default: XE.
  • user: Database user. Default: postgres.
  • password: Plain text password. We recommend not to use this, but in some legacy connections it might be required.
  • application_name. Application name. Can be used even if the user set dsn or node_name instead of all other connection information. Default: livecompare_<Connection ID>.
  • start_query: Arbitrary query that is executed each time a connection to a database is open.
  • fetch_fk_metadata: If LiveCompare should gather metadata about foreign keys on the connection. Default: on.

Table Filter

If omitted or left empty, this section from the .ini file will mean that LiveCompare should be executed against all tables in the first database.

If you want LiveCompare to be executed against a specific set of tables, there are different ways to specify this:

  • publications: You can filter specific publications, and LiveCompare will use only the tables associated to those publications. The variable publication_name can be used to build the conditional expression, for example:
publications = publication_name = 'livepub'

Requires logical_replication_mode = native.

  • replication_sets: When using pglogical or BDR, you can filter specific replication sets, and LiveCompare will work only on the tables associated to those replication sets. The variable set_name can be used to build the conditional expression, for example:
replication_sets = set_name in ('default', 'bdrgroup')

Requires logical_replication_mode = pglogical or logical_replication_mode = bdr.

  • schemas: You can filter specific schemas, and LiveCompare will work only on the tables that belong to those schemas. The variable schema_name can be used to build the conditional expression, for example:
schemas = schema_name != 'badschema'
  • tables: The variable table_name can help you build a conditional expression to filter only the tables you want LiveCompare to work on, for example:
tables = table_name not like '%%account'

Please note that, in any conditional expression, the % character should be escaped as %%.

The table name should be schema-qualified, unless schema_qualified_table_names is disabled. For example, it's possible to filter only a specific list of tables:

tables = table_name in ('myschema1.mytable1', 'myschema2.mytable2')

If you have disabled general setting schema_qualified_table_names, then you should also set an appropriate search_path for Postgres in the connection start_query setting, for example:

[General Setting]
...
schema_qualified_table_names = off

[My Connection]
...
start_query = SET search_path TO myschema1, myschema2

[Table Filter]
tables = table_name in ('mytable1', 'mytable2')

IMPORTANT: Please note that if two or more schemas that were set on search_path contains a table if the same name, just the first one found will be considered in the comparison.

The Table Filter section can have a mix of publications, replication_sets, schemas and tables filters, and LiveCompare will consider the set of tables that are in the intersection of all filters you specified. For example:

[Table Filter]
publications = publication_name = 'livepub'
replication_sets = set_name in ('default', 'bdrgroup')
schemas = schema_name != 'badschema'
tables = table_name not like '%%account'

Also please note that the table filter is applied in the first database, to build the table list. If a table exists in the first database and is being considered in the filter, but does not exist in any other database, then you will see something like this in the logs, and the comparison for that specific table will be skipped.

2019-06-17 11:52:41,403 - ERROR - live_table.py - 55 - GetMetaData - P1: livecompare_second_1: Table public.test does not exist
2019-06-17 11:52:41,410 - ERROR - live_round.py - 201 - Initialize - P1: Table public.test does not exist on second connection. Aborting comparison

Similarly, if a table exists in any other database but does not exist in the first database, then it won't be considered in the comparison, even if you didn't apply any table filter.

A comparison for a specific table will also be skipped if the table column names are not exactly the same (unless column_intersection is enabled), and in the same order. An appropriate message will be included in the log file as well.

Currently LiveCompare does not check if data types nor constraints are the same on both tables.

IMPORTANT: please note that conflicts mode doesn't make use of table filter.

Row Filter

In this section you can apply a row-level filter to any table, so LiveCompare will work only on the rows that satisfy the row filter.

You can write a list of tables under this section, one table per line (all table names should be schema qualified unless schema_qualified_table_names is disabled), for example:

[Row Filter]
public.table1 = id = 10
public.table2 = logdate >= '2000-01-01'

In this case, for the table public.table1, LiveCompare will work only in the rows that satisfy the clause id = 10, while for the table public.table2, only rows that satisfy logdate >= '2000-01-01 will be considered in the comparison.

If you have disabled general setting schema_qualified_table_names, then you should also set an appropriate search_path for Postgres in the connection start_query setting, for example:

[General Setting]
...
schema_qualified_table_names = off

[My Connection]
...
start_query = SET search_path TO public

[Row Filter]
table1 = id = 10
table2 = logdate >= '2000-01-01'

Any kind of SQL condition (same as you would put in the WHERE clause) is accepted, in the same line, as the table row filter. For example, if you have a large table and want to compare only a specific number of IDs, it's possible to create a temporary table with all the IDs. Then you can use an IN clause to emulate a JOIN, like this:

[Row Filter]
public.large_table = id IN (SELECT id2 FROM temp_table)

If a row filter is written incorrectly, then LiveCompare will try to apply the filter but will fail. So the comparison for this specific table will be skipped, and an exception will be written to the log file.

If a table is listed in the Row Filter section, but somehow got filtered out by the Table Filter, then the row filter for this table will be silently ignored.

IMPORTANT: please note that conflicts mode doesn't make use of row filter.

Column Filter

In this section you can apply a column-level filter to any table, so LiveCompare will work only on the columns that are not part of the column filter.

You can write a list of tables under this section, one table per line (all table names should be schema qualified unless schema_qualified_table_names is disabled). For example, considering both public.table1 and public.table2 have the columns column1, column2, column3, column4 and column5:

[Column Filter]
public.table1 = column1, column3
public.table2 = column1, column5

In this case, for the table public.table1, LiveCompare will work only in the columns column2, column4 and column5, filtering out column1 and column3, while for the table public.table2, only the columns column2, column3 and column4 will be considered in the comparison, filtering out column1 and column5.

If you have disabled general setting schema_qualified_table_names, then you should also set an appropriate search_path for Postgres in the connection start_query setting, for example:

[General Setting]
...
schema_qualified_table_names = off

[My Connection]
...
start_query = SET search_path TO public

[Column Filter]
table1 = column1, column3
table2 = column1, column5

If absent column names are given in the column filter, that is, column doesn't exist in the given table, then LiveCompare will log a message about the columns that could not be found and ignore them, using just the valid ones, if any.

If a table is listed in the Column Filter section, but somehow got filtered out by the Table Filter, then the column filter for this table will be silently ignored.

IMPORTANT: Please note that if a column specified in a Column Filter is part of the table PK, then it won't be ignored in the comparison. LiveCompare will log that and ignore the filter of such column.

IMPORTANT: please note that conflicts mode doesn't make use of column filter.

Conflicts Filter

In this section you can specify a filter to be used in --conflicts mode while fetching conflicts from BDR nodes. You can build any SQL conditional expression, and use below fields in the expression:

  • origin_node: the upstream node of the subscription
  • target_node: the downstream node of the subscription
  • local_time: the timestamp when the conflict occurred in the node
  • conflict_type: the type of the conflict
  • conflict_resolution: the resolution which was applied
  • nspname: schema name of the involved relation
  • relname: relation name of the involved relation

You must use conflicts attribute under the section. Please find an example below:

[Conflicts Filter]
conflicts = conflict_type = 'update_missing' AND nspname = 'my_schema'

By adding above piece of configuration to your INI file, LiveCompare would fetch just conflicts that are of type update_missing, and related to tables under schema my_schema while querying for conflicts in each of the BDR nodes.

IMPORTANT: Please note that this section is exclusive for --conflicts mode.