9.2 Performing Data Validation

Table of Contents Previous Next


9 Data Validator : 9.2 Performing Data Validation

The current working directory from which you invoke the Data Validator script runValidation.sh (runValidation.bat for Windows) must be the bin subdirectory containing the script (that is, XDB_HOME/bin).
[ option ] ...
schema_name is the name of the schema in the source database containing the tables to be validated. The choices for option are listed later in this section within the Options subsection.
[ option ] ...
The general syntax for all options except for --version and --help is shown by the following:
[ -ts schema ]
[ -it table_1 [,table_2 ] ... ]
[ -et table_1 [,table_2 ] ... ]
[ -ld log_directory_path ]
[ -ds { true | false } ]
[ -sdbms database_type ]
[ -sh host ]
[ -sp port ]
[ -sdb dbname ]
[ -su user ]
[ -spw password ]
[ -tdbms database_type ]
[ -th host ]
[ -tp port ]
[ -tdb dbname ]
[ -tu user ]
[ -tpw password ]
[ -bs row_count ]
[ -fs row_count ]
For clarity, the preceding syntax diagram shows only the single-character form of the option. The Options subsection lists both the single-character and multi-character forms of the options.
Specification of any database connection option (-sdbms through -tpw listed in the preceding syntax diagram) overrides the corresponding parameter in the datavalidator.properties file. See Section 9.1 for information on the datavalidator.properties file.
-ss, --source-schema schema
-ts, --target-schema schema
-it, --include-tables table_1 [,table_2 ] ...
-et, --exclude-tables table_1 [,table_2 ] ...
The tables within the source schema that are to be excluded from comparison. If omitted, only those tables specified with the -it option are included for comparison. If both the -it and -et options are omitted, all source schema tables are included for comparison. Note: There must be no white space between the comma and table names.
-srs, --skip-rowsonlyin-source { true | false }
When true is specified, the logging of differences for rows that exist only in the source database table are skipped. The default is false.
-srt, --skip-rowsonlyin-target { true | false }
When true is specified, the logging of differences for rows that exist only in the target database table are skipped. The default is false.
-srb, --skip-rowsin-both { true | false }
When true is specified, the logging of differences for rows that exist both in the source and target database tables with the same primary key, but with different non-primary key values are skipped. The default is false.
-ld, --logging-dir log_directory_path
Directory path to where the Data Validator log and diff files are to be created and stored. If log_directory_path does not exist, Data Validator attempts to create it. If a full directory path is not specified log_directory_path is created or assumed to be located relative to the XDB_HOME/bin subdirectory where the runValidation.sh script is invoked. (That is, the logs directory is XDB_HOME/bin/log_directory_path.) Be sure the operating system account used to invoke the runValidation.sh script has the privileges to create the directory if it does not already exist, or to create files in the specified directory if it does already exist. If omitted, the default is the XDB_HOME/bin/logs directory.
-ds, --display-summary { true | false }
Specify true to display only the Data Validator summary. This omits the source and target database connection information as well as the detailed breakdown of the results by source database table. Specify false to display all of the Data Validator results. The type and amount of information that is displayed at the command line console when the Data Validator is invoked is the same information that is also stored in the log file for that run. If omitted, the default is false (that is, all of the Data Validator results is displayed).
-sdbms, --source-dbms database_type
The type of the source database server. Supported types are oracle, enterprisedb, sqlserver, sybase, and mysql.
-sh, --source-host host
-sp, --source-port port
The port number on which the source database server is listening for connections.
-sdb, --source-database dbname
-su, --source-user user
-spw, --source-password password
-tdbms, --target-dbms database_type
-th, --target-host host
-tp, --target-port port
The port number on which the target database server is listening for connections.
-tdb, --target-database dbname
-tu, --target-user user
-tpw, --target-password password
-bs, --batch-size row_count
The -bs option specifies the number of rows to group in a batch to be used for comparison across the source and target database tables. For example, if a table contains 1000 rows, then a -bs setting of 100 requires 10 batch iterations to complete the comparison across the source and target databases. The Data Validator reads 100 rows, both from the source and target tables, and adds them in source and target buffers. The validation thread then reads the 100 rows from the source and target buffers and performs the comparison. It will then move to read and prepare the next 100 rows for comparison and so on. Note that the actual database round trips required to bring in 100 rows from the database depends on the -fs option for the fetch size. For example, an -fs setting of 100 needs just one round trip whereas an -fs setting of 10 requires 10 database round trips.
-fs, --fetch-size row_count
Performing data validation for tables that are quite large in size may cause the Data Validator to terminate with an out of heap space error when using the default fetch size of 5000 rows. Use the -fs option to specify a smaller fetch size to help avoid the out of heap space issue. The result set iteration will bring in as many rows as represented by the row_count value in a single database round trip.
The following lists the tables in schema EDB along with the content of tables DEPT and EMP in the Oracle source database:
The following lists the tables in schema public along with the content of tables dept and emp in the Advanced Server edb database:
The Oracle EDB schema contains one additional table named ORATAB that does not exist in the Advanced Server public schema.
The Oracle DEPT table contains one extra row with DEPTNO 50 that does not exist in the Advanced Server dept table.
The rows in the EMP table with EMPNO values 9001 and 9002 have column values that differ between the Oracle and Advanced Server tables
In this example, the JOBHIST table contains identical rows for both the Oracle and Advanced Server tables.
The content of the datavalidator.properties file is set as follows:
The following example compares all tables in the Oracle EDB schema against the Advanced Server public schema.
The Data Validator log files are created in directory /home/user/datavalidator_logs as specified with the -ld option. The operating system account used to invoke the runValidation.sh script has write access to the /home/user directory so the Data Validator can create the datavalidator_logs subdirectory.
All tables count: 4
Validated tables count: 3
Rows count: 38
Errors count: 3
Missing tables on the target database count: 1
Tables list:
- EDB.ORATAB
Tables having only unsupported datatypes count: 0
Tables having primary key limitation count: 0
Total time(s): 0.678
Rows per second: 56
There is one error in the DEPT table (the missing row).
There are two errors in the EMP table (the two rows with mismatching column values)
The JOBHIST table contains no errors.
The ORATAB table does not exist on the target database.
The following example includes only tables dept and emp with the -it option when comparing the Oracle EDB schema against the Advanced Server public schema.
The following example excludes tables ORATAB and jobhist with the -et option when comparing the Oracle EDB schema against the Advanced Server public schema. The -ds true option results in the display of only the Data Validator summary.

9 Data Validator : 9.2 Performing Data Validation

Table of Contents Previous Next