Performing data validation v7

The current working directory from which you invoke the Data Validator script runValidation.sh (runValidation.bat for Windows) must be the bin subdirectory containing the script (that is, XDB_HOME/bin).

For example, if the Replication Server is installed in its default directory location, then issue the following command before invoking the Data Validator:

cd /usr/edb/xdb/bin

Similarly for Windows hosts, issue the following:

cd C:\Program Files\edb\EnterpriseDB-xDBReplicationServer\bin

The general command format for invoking the Data Validator is the following:

./runValidation.sh { –ss | --source-schema } schema_name
  [ option ] ...

schema_name is the name of the schema in the source database containing the tables to validate.

For Windows hosts, the command format is the following:

runValidation { –ss | --source-schema } schema_name
  [ option ] ...

The following option displays the Data Validator version:

./runValidation.sh { –v | --version }

On Linux the version is displayed as follows:

$ ./runValidation.sh --version
EnterpriseDB DataValidator Build 3

On Windows the version is displayed as follows:

C:\Program Files\edb\EnterpriseDB-xDBReplicationServer\bin>runValidation -v
EnterpriseDB DataValidator Build 3

The following option displays the help information.

./runValidation.sh { –h | --help }

For example,

$ ./runValidation.sh --help
Usage:
   runValidation.sh (-v  | --version) | (-h | --help)
   runValidation.sh (-ss | --source-schema) SOURCE_SCHEMA [OPTIONS] CONNECTION_INFO_FILE

OPTIONS:

   (-ts | --target-schema)   target-schema-name
   (-it | --include-tables)  comma-seperated-tables-name
   (-et | --exclude-tables)  comma-seperated-tables-name

   (-ld | --logging-dir)     logging-dir-path
   (-ds  | --display-summary) (true|false)
   (-srs | --skip-rowsonlyin-source) (true|false)
   (-srt | --skip-rowsonlyin-target) (true|false)
   (-srb | --skip-rowsin-both) (true|false)
   (-fs  | --fetch-size) row count
   (-bs  | --batch-size) row count

   (-sdbms | --source-dbms)     source database type
   (-sh | --source-host) source database server name/IP
   (-sp | --source-port) source database server port
   (-sdb | --source-database)     source database name
   (-su | --source-user) source database user id
   (-spw | --source-password) source database user password
   (-tdbms | --target-dbms)     target database type
   (-th | --target-host) target database server name/IP
   (-tp | --target-port) target database server port
   (-tdb | --target-database)     target database name
   (-tu | --target-user) target database user id
   (-tpw | --target-password) target database user password
   (-uoc | --use-ora-case) use oracle (upper) case for table name
New enhancement

The --use-ora-case option is available in Replication Server 7.1 and later.

Note

Data Validator does not support column mapping. With custom column mapping, table columns in the source and target databases can have different data types. As a result, the Data Validator is not able to validate the data when column mapping is applied.

The general syntax for all options except for --version and --help is shown by the following:

./runValidation.sh –ss schema
  [ -ts schema ]
  [ -it table_1 [,table_2 ] ... ]
  [ -et table_1 [,table_2 ] ... ]
  [ -srs { true | false } ]
  [ -srt { true | false } ]
  [ -srb { true | false } ]
  [ -ld log_directory_path ]
  [ -ds { true | false } ]
  [ -sdbms database_type ]
  [ -sh host ]
  [ -sp port ]
  [ -sdb dbname ]
  [ -su user ]
  [ -spw password ]
  [ -tdbms database_type ]
  [ -th host ]
  [ -tp port ]
  [ -tdb dbname ]
  [ -tu user ]
  [ -tpw password ]
  [ -bs row_count ]
  [ -fs row_count ]
  [ -uoc ]

For clarity, the syntax diagram shows only the single-character form of the option. The description of each option lists both the single-character and multi-character forms.

Specifying any database connection option (-sdbms through -tpw listed in the syntax diagram) overrides the corresponding parameter in the datavalidator.properties file. See Installing and configuring the data validator for information on the datavalidator.properties file.

Options

-ss, --source-schema schema

The schema of the source database containing the tables to compare against the target database.

-ts, --target-schema schema

The schema of the target database containing the tables to compare against the source database. If omitted, the schema of the target database is the same schema as specified for the source database with the -ss option.

-it, --include-tables table_1 [,table_2 ] ...

The tables in the source schema to include for comparison. If omitted, all tables in the source schema are compared against tables in the target schema. The exception is the tables excluded from comparison using the -et option. Don't use white space between the comma and table names.

-et, --exclude-tables table_1 [,table_2 ] ...

The tables in the source schema to exclude from comparison. If omitted, only those tables specified with the -it option are included for comparison. If both the -it and -et options are omitted, all source schema tables are included for comparison. Don't use white space between the comma and table names.

-srs, --skip-rowsonlyin-source { true | false }

When you specify true, the logging of differences for rows that exist only in the source database table are skipped. The default is false.

-srt, --skip-rowsonlyin-target { true | false }

When you specify true, the logging of differences for rows that exist only in the target database table are skipped. The default is false.

-srb, --skip-rowsin-both { true | false }

When you specify true, the logging of differences for rows:

  • That exist both in the source and target database tables
  • Have the same primary key
  • Have different non-primary key values

are skipped. The default is false.

-ld, --logging-dir log_directory_path

Directory path to create and store the Data Validator log and diff files. If log_directory_path doesn't exist, Data Validator attempts to create it. If a full directory path isn't specified log_directory_path is created or assumed to be located relative to the XDB_HOME/bin subdirectory where the runValidation.sh script is invoked. (That is, the logs directory is XDB_HOME/bin/log_directory_path.) Be sure the operating system account used to invoke the runValidation.sh script has the privileges to create the directory if needed or to create files in the specified directory. The default is the XDB_HOME/bin/logs directory.

-ds, --display-summary { true | false }

Specify true to display only the Data Validator summary. This value omits the source and target database connection information as well as the detailed breakdown of the results by source database table. Specify false to display all of the Data Validator results. The type and amount of information that is displayed at the command line console when the Data Validator is invoked is the same information that is also stored in the log file for that run. The default is false.

-sdbms, --source-dbms database_type

The type of the source database server. Supported types are oracle, enterprisedb, sqlserver, sybase, and mysql.

-sh, --source-host host

The IP address or server name of the host where the source database server is running.

-sp, --source-port port

The port number on which the source database server is listening for connections.

-sdb, --source-database dbname

The name of the source database.

-su, --source-user user

The database user name for connecting to the source database.

-spw, --source-password password

The password of the source database user in unencrypted form.

-tdbms, --target-dbms database_type

The type of the target database server. Supported types are enterprisedb and oracle.

-th, --target-host host

The IP address or server name of the host whereh the target database server is running.

-tp, --target-port port

The port number on which the target database server is listening for connections.

-tdb, --target-database dbname

The name of the target database.

-tu, --target-user user

The database user name for connecting to the target database.

-tpw, --target-password password

The password of the target database user in unencrypted form.

-bs, --batch-size row_count

The -bs option specifies the number of rows to group in a batch to use for comparison across the source and target database tables. For example, if a table contains 1000 rows, then a -bs setting of 100 requires 10 batch iterations to complete the comparison across the source and target databases. The Data Validator reads 100 rows, both from the source and target tables, and adds them in source and target buffers. The validation thread then reads the 100 rows from the source and target buffers and performs the comparison. It then moves to read and prepare the next 100 rows for comparison, and so on. The actual database round trips required to bring in 100 rows from the database depends on the -fs option for the fetch size. For example, an -fs setting of 100 needs just one round trip and an -fs setting of 10 requires 10 database round trips.

-fs, --fetch-size row_count

Performing data validation for tables that are quite large can cause the Data Validator to stop with an out-of-heap-space error when using the default fetch size of 5000 rows. Use the -fs option to specify a smaller fetch size to help avoid the out-of-heap-space issue. The result set iteration brings in as many rows as represented by the row_count value in a single database round trip.

Examples

The following examples use an Oracle source database and an EDB Postgres Advanced Server target database to compare the tables in schema EDB on Oracle against the tables in schema public in EDB Postgres Advanced Server.

The following lists the tables in schema EDB along with the content of tables DEPT and EMP in the Oracle source database:

SQL> SELECT table_name FROM user_tables;
Output
TABLE_NAME
------------------------------
ORATAB
DEPT
EMP
JOBHIST
SQL> SELECT * FROM dept;
Output
    DEPTNO DNAME          LOC
---------- -------------- -------------
        10 ACCOUNTING     NEW YORK
        20 RESEARCH       DALLAS
        30 SALES          CHICAGO
        40 OPERATIONS     BOSTON
        50 FINANCE        CHICAGO
SQL> SELECT * FROM emp;
Output
     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7369 SMITH      CLERK           7902 17-DEC-80        800                    20
      7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
      7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30
      7566 JONES      MANAGER         7839 02-APR-81       2975                    20
      7654 MARTIN     SALESMAN        7698 28-SEP-81       1250       1400         30
      7698 BLAKE      MANAGER         7839 01-MAY-81       2850                    30
      7782 CLARK      MANAGER         7839 09-JUN-81       2450                    10
      7788 SCOTT      ANALYST         7566 19-APR-87       3000                    20
      7839 KING       PRESIDENT            17-NOV-81       5000                    10
      7844 TURNER     SALESMAN        7698 08-SEP-81       1500          0         30
      7876 ADAMS      CLERK           7788 23-MAY-87       1100                    20
      7900 JAMES      CLERK           7698 03-DEC-81        950                    30
      7902 FORD       ANALYST         7566 03-DEC-81       3000                    20
      7934 MILLER     CLERK           7782 23-JAN-82       1300                    10
      9001 SMITH      ANALYST         7566                 8500                    20
      9002 ROGERS     SALESMAN        7698                 8000       4000         30

16 rows selected.

The following lists the tables in schema public along with the content of tables dept and emp in the EDB Postgres Advanced Server edb database:

edb=# \dt
            List of relations
Output
Schema |  Name   | Type  |    Owner
--------+---------+-------+--------------
 public | dept    | table | enterprisedb
 public | emp     | table | enterprisedb
 public | jobhist | table | enterprisedb
(3 rows)
edb=# SELECT * FROM dept;
Output
 deptno |   dname    |   loc
--------+------------+----------
     10 | ACCOUNTING | NEW YORK
     20 | RESEARCH   | DALLAS
     30 | SALES      | CHICAGO
     40 | OPERATIONS | BOSTON
(4 rows)
edb=# SELECT * FROM emp;
Output
 empno | ename  |    job    | mgr  |      hiredate      |   sal   |  comm   | deptno
-------+--------+-----------+------+--------------------+---------+---------+--------
  7369 | SMITH  | CLERK     | 7902 | 17-DEC-80 00:00:00 |  800.00 |         |     20
  7499 | ALLEN  | SALESMAN  | 7698 | 20-FEB-81 00:00:00 | 1600.00 |  300.00 |     30
  7521 | WARD   | SALESMAN  | 7698 | 22-FEB-81 00:00:00 | 1250.00 |  500.00 |     30
  7566 | JONES  | MANAGER   | 7839 | 02-APR-81 00:00:00 | 2975.00 |         |     20
  7654 | MARTIN | SALESMAN  | 7698 | 28-SEP-81 00:00:00 | 1250.00 | 1400.00 |     30
  7698 | BLAKE  | MANAGER   | 7839 | 01-MAY-81 00:00:00 | 2850.00 |         |     30
  7782 | CLARK  | MANAGER   | 7839 | 09-JUN-81 00:00:00 | 2450.00 |         |     10
  7788 | SCOTT  | ANALYST   | 7566 | 19-APR-87 00:00:00 | 3000.00 |         |     20
  7839 | KING   | PRESIDENT |      | 17-NOV-81 00:00:00 | 5000.00 |         |     10
  7844 | TURNER | SALESMAN  | 7698 | 08-SEP-81 00:00:00 | 1500.00 |    0.00 |     30
  7876 | ADAMS  | CLERK     | 7788 | 23-MAY-87 00:00:00 | 1100.00 |         |     20
  7900 | JAMES  | CLERK     | 7698 | 03-DEC-81 00:00:00 |  950.00 |         |     30
  7902 | FORD   | ANALYST   | 7566 | 03-DEC-81 00:00:00 | 3000.00 |         |     20
  7934 | MILLER | CLERK     | 7782 | 23-JAN-82 00:00:00 | 1300.00 |         |     10
  9001 | SMITH  | SALESMAN  | 7698 |                    | 8000.00 | 4000.00 |     30
  9002 | ROGERS | SALESMAN  | 7698 |                    | 9500.00 | 4000.00 |     30
(16 rows)

Note the following differences:

  • The Oracle EDB schema contains one moew table named ORATAB that doesn't exist in the EDB Postgres Advanced Server public schema.
  • The Oracle DEPT table contains one extra row with DEPTNO 50 that doesn't exist in the EDB Postgres Advanced Server dept table.
  • The rows in the EMP table with EMPNO values 9001 and 9002 have column values that differ between the Oracle and EDB Postgres Advanced Server tables.
  • In this example, the JOBHIST table contains identical rows for both the Oracle and Advanced Server tables.

The content of the datavalidator.properties file is set as follows:

###############################################################
            Source database connection
###############################################################

#source_dbms=(enterprisedb | oracle | sqlserver | sybase | mysql)

source_dbms=oracle
source_host=192.168.2.23
source_port=1521
source_database=xe
source_user=edb
source_password=password

###############################################################
         Target database connection
###############################################################

#target_dbms=(enterprisedb | oracle)

target_dbms=enterprisedb
target_host=localhost
target_port=5444
target_database=edb
target_user=enterprisedb
target_password=password

The following example compares all tables in the Oracle EDB schema against the EDB Postgres Advanced Server public schema.

The Data Validator log files are created in directory /home/user/datavalidator_logs as specified with the -ld option. The operating system account used to invoke the runValidation.sh script has write access to the /home/user directory so the Data Validator can create the datavalidator_logs subdirectory.

$ cd /usr/edb/xdb/bin
$ pwd
/usr/edb/xdb/bin
$ ./runValidation.sh -ss edb -ts public -ld /home/user/datavalidator_logs

EnterpriseDB DataValidator Build 3
----------------------------------------------------------------------------
Source and target databases connection information
----------------------------------------------------------------------------

Source database:

     DBMS:     ORACLE
     Host:     192.168.2.23
     Port:     1521
     Database: xe
     User:     edb

Target database:

     DBMS:     ENTERPRISEDB
     Host:     localhost
     Port:     5444
     Database: edb
     User:     enterprisedb

----------------------------------------------------------------------------
Databases data validation process started...
----------------------------------------------------------------------------

Validating Table DEPT
  Rows validated: 5
  Finished validating table DEPT with 1 errors.
  Logging errors details in the diff file...

Validating Table EMP
  Rows validated: 16
  Finished validating table EMP with 2 errors.
  Logging errors details in the diff file...

Validating Table JOBHIST
  Rows validated: 17
  Finished validating table JOBHIST with 0 errors.

Validating Table ORATAB
  Table not validated as it does not exist on the target database.


DataValidator found 3 errors across source and target databases.
For detailed error report see datavalidator_20150713-144417.diff file.

----------------------------------------------------------------------------
Data validation process has completed.
----------------------------------------------------------------------------

***************************************************************
                            DataValidator Summary
***************************************************************

       All tables count: 4

       Validated tables count: 3
       Rows count: 38
       Errors count: 3

       Missing tables on the target database count: 1
       Tables list:
           - EDB.ORATAB

       Tables having only unsupported datatypes count: 0

       Tables having primary key limitation count: 0

       Total time(s): 0.678
       Rows per second: 56

***************************************************************

The Data Validator output indicates the following:

  • There is one error in the DEPT table (the missing row).
  • There are two errors in the EMP table (the two rows with mismatching column values).
  • The JOBHIST table contains no errors.
  • The ORATAB table doesn't exist on the target database.

The following shows the files created in the Data Validator logs directory:

$ pwd
/home/user/datavalidator_logs
$ ls -l
total 24
-rw-rw-r-- 1 user user 18999 Aug 13 15:44 datavalidator_20150713-144417.diff
-rw-rw-r-- 1 user user  2133 Aug 13 15:44 datavalidator_20150713-144417.log

The log file contains the same content as displayed when the Data Validator is invoked. The diff file compares the differences where errors were detected.

The following example includes only tables dept and emp with the -it option when comparing the Oracle EDB schema against the Advanced Server public schema.

$ cd /usr/edb/xdb/bin
$ pwd
/usr/edb/xdb/bin
$ ./runValidation.sh -ss edb -ts public -ld /home/user/datavalidator_logs -it dept,emp

EnterpriseDB DataValidator Build 3
----------------------------------------------------------------------------
Source and target databases connection information
----------------------------------------------------------------------------

Source database:

     DBMS:     ORACLE
     Host:     192.168.2.23
     Port:     1521
     Database: xe
     User:     edb

Target database:

     DBMS:     ENTERPRISEDB
     Host:     localhost
     Port:     5444
     Database: edb
     User:     enterprisedb

----------------------------------------------------------------------------
Databases data validation process started...
----------------------------------------------------------------------------

Validating Table DEPT
  Rows validated: 5
  Finished validating table DEPT with 1 errors.
  Logging errors details in the diff file...

Validating Table EMP
  Rows validated: 16
  Finished validating table EMP with 2 errors.
  Logging errors details in the diff file...


DataValidator found 3 errors across source and target databases.
For detailed error report see ``datavalidator_20150714-123353.diff`` file.

----------------------------------------------------------------------------
Data validation process has completed.
----------------------------------------------------------------------------

***************************************************************
                            DataValidator Summary
***************************************************************

       All tables count: 2

       Validated tables count: 2
       Rows count: 21
       Errors count: 3

       Missing tables on the target database count: 0

       Tables having only unsupported datatypes count: 0

       Tables having primary key limitation count: 0

       Total time(s): 0.539
       Rows per second: 39

***************************************************************

The following example excludes tables ORATAB and jobhist with the -et option when comparing the Oracle EDB schema against the EDB Postgres Advanced Server public schema. The -ds true option results in the display of only the Data Validator summary.

$ ./runValidation.sh -ss edb -ts public -ld /home/user/datavalidator_logs -et ORATAB,jobhist -ds true
Databases data validation process started...

***************************************************************
                            DataValidator Summary
***************************************************************

       All tables count: 2

       Validated tables count: 2
       Rows count: 21
       Errors count: 3

       Missing tables on the target database count: 0

       Tables having only unsupported datatypes count: 0

       Tables having primary key limitation count: 0

       Total time(s): 0.535
       Rows per second: 39

***************************************************************

For this run, the corresponding log file contains only the Data Validator summary, omitting the source and target database connection information along with the error breakdown by table.