Postgres Plus Advanced Server Oracle Compatibility Developer's Guide : 11.2 EDB*Loader

Previous PageTable Of ContentsNext Page

Postgres Plus Advanced Server Oracle Compatibility Developer's Guide

 

11.2 EDB*Loader

EDB*Loader is a high-performance bulk data loader that provides an Oracle compatible interface for Postgres Plus Advanced Server. The EDB*Loader command line utility loads data from an input source, typically a file, into one or more tables using a subset of the parameters offered by Oracle SQL*Loader.

EDB*Loader features the following:

    ● Support of the Oracle SQL*Loader data loading methods: conventional path load, direct path load, and parallel direct path load

    ● Oracle SQL*Loader compatible syntax for control file directives

    ● Input data with delimiter-separated or fixed-width fields

    ● Bad file for collecting rejected records

    ● Loading of multiple target tables

    ● Discard file for collecting records that do not meet the selection criteria of any target table

    ● Log file for recording the EDB*Loader session and any error messages

    ● Data loading from standard input and remote loading, particularly useful for large data sources on remote hosts

These features are explained in detail in the following sections.

11.2.1 Data Loading Methods

As with Oracle SQL*Loader, EDB*Loader supports three data loading methods:

    ● Conventional path load

    ● Direct path load

    ● Parallel direct path load

Conventional path load is the default method used by EDB*Loader. Basic insert processing is used to add rows to the table.

The advantage of a conventional path load over the other methods is that table constraints and database objects defined on the table such as primary keys, not null constraints, check constraints, unique indexes, foreign key constraints, and triggers are enforced during a conventional path load.

The exception is that Postgres Plus Advanced Server rules defined on the table are not enforced. EDB*Loader can load tables on which rules are defined, but the rules are not executed. As a consequence, partitioned tables implemented using rules cannot be loaded using EDB*Loader.

Note: Postgres Plus Advanced Server rules are created with the CREATE RULE command. Postgres Plus Advanced Server rules are not the same database objects as rules and rule sets used in Oracle.

EDB*Loader also supports direct path loads. A direct path load is faster than a conventional path load, but requires the removal of most types of constraints and triggers from the table. See Section 11.2.5 for information on direct path loads.

Finally, EDB*Loader supports parallel direct path loads. A parallel direct path load provides even greater performance improvement by permitting multiple EDB*Loader sessions to run simultaneously to load a single table. See Section 11.2.6 for information on parallel direct path loads.

11.2.2 General Usage

EDB*Loader can load data files with either delimiter-separated or fixed-width fields in UTF-8 and single-byte character sets. The delimiter must be a single-byte character. Data file encoding and the database encoding are expected to be the same. Character set conversion of the data file to the database encoding is not supported.

Each EDB*Loader session runs as a single, independent transaction. If an error should occur during the EDB*Loader session that aborts the transaction, all changes made during the session are rolled back.

Generally, formatting errors in the data file do not result in an aborted transaction. Instead, the badly formatted records are written to a text file called the bad file. The reason for the error is recorded in the log file.

Records causing database integrity errors do result in an aborted transaction and rollback. As with formatting errors, the record causing the error is written to the bad file and the reason is recorded in the log file.

Note: EDB*Loader differs from Oracle SQL*Loader in that a database integrity error results in a rollback in EDB*Loader. In Oracle SQL*Loader, only the record causing the error is rejected. Records that were previously inserted into the table are retained and loading continues after the rejected record.

The following are examples of types of formatting errors that do not abort the transaction:

    ● Attempt to load non-numeric value into a numeric column

    ● Numeric value is too large for a numeric column

    ● Character value is too long for the maximum length of a character column

    ● Attempt to load improperly formatted date value into a date column

The following are examples of types of database errors that abort the transaction and result in the rollback of all changes made in the EDB*Loader session:

    ● Violation of a unique constraint such as a primary key or unique index

    ● Violation of a referential integrity constraint

    ● Violation of a check constraint

    ● Error thrown by a trigger fired as a result of inserting rows

11.2.3 EDB*Loader Control File

When you invoke EDB*Loader, the list of arguments must include the name of a control file. The control file includes the instructions that EDB*Loader uses to load the table or tables from the input data file. The control file includes information such as the following:

    ● Name of the input data file containing the data to be loaded

    ● Name of the table or tables to be loaded from the data file

    ● Names of the columns within the table or tables and their corresponding field placement in the data file

    ● Specification of whether the data file uses a delimiter character to separate the fields, or if the fields occupy fixed column positions

    ● Optional selection criteria to choose which records from the data file to load into a given table

    ● Name of the bad file for collecting illegally formatted records from the data file

    ● Name of the discard file for collecting records from the data file that do not meet the selection criteria of any table

The syntax for the EDB*Loader control file is as follows:

[ OPTIONS (param=value [, param=value ] ...) ]
LOAD DATA
  INFILE '{ data_file | stdin }'
  [ BADFILE 'bad_file' ]
  [ DISCARDFILE 'discard_file' ]
[ INSERT | APPEND | REPLACE | TRUNCATE ]
{ INTO TABLE target_table
  [ WHEN field_condition [ AND field_condition ] ...]
  [ FIELDS TERMINATED BY 'termchar'
    [ OPTIONALLY ENCLOSED BY 'enclchar' ] ]
  [ TRAILING NULLCOLS ]
   (field_def [, field_def ] ...)
} ...

where field_def defines a data field, or simply, a field, in data_file that describes the location, data format, or value of the data to be inserted into column_name of target_table. The syntax of field_def is the following:

column_name {
  CONSTANT val |
  FILLER [ POSITION (start:end) ] [ fieldtype ] |
  [ POSITION (start:end) ] [ fieldtype ] [ "expr" ]
}

where fieldtype is one of:

CHAR | INTEGER EXTERNAL | FLOAT EXTERNAL | DECIMAL EXTERNAL |
  ZONED EXTERNAL | DATE [ "datemask" ]

Description

The specification of data_file, bad_file, and discard_file may include the full directory path or a relative directory path to the file name. If the file name is specified alone or with a relative directory path, the file is then assumed to exist (in the case of data_file), or is created (in the case of bad_file or discard_file), relative to the current working directory from which edbldr is invoked.

The operating system account enterprisedb must have read permission on the directory and file specified by data_file.

The operating system account enterprisedb must have write permission on the directories where bad_file and discard_file are to be written.

Note: It is suggested that the file names for data_file, bad_file, and discard_file include extensions of .dat, .bad, and .dsc, respectively. If the provided file name does not contain an extension, EDB*Loader assumes the actual file name includes the appropriate aforementioned extension.

If an EDB*Loader session results in data format errors and the BADFILE clause is not specified, nor is the BAD parameter given on the command line when edbldr is invoked, a bad file is created with the name control_file_base.bad in the current working directory from which edbldr is invoked. control_file_base is the base name of the control file (that is, the file name without any extension) used in the edbldr session.

If the DISCARDFILE clause is not specified, nor is the DISCARD parameter given on the command line when edbldr is invoked, the discard file is not created even if the EDB*Loader session results in discarded records.

If one of INSERT, APPEND, REPLACE, or TRUNCATE is specified, it establishes the default action of how rows are to be added to target tables. If omitted, the default action is as if INSERT had been specified.

If the FIELDS TERMINATED BY clause is specified, then the POSITION (start:end) clause may not be specified for any field_def. Alternatively if the FIELDS TERMINATED BY clause is not specified, then every field_def must contain the POSITION (start:end) clause, excluding those with the CONSTANT clause.

Parameters

param=value

param=value represents one or more of the following command line parameter/value pairs discussed in Section 11.2.4: skip, errors, skip_index_maintenance, direct, parallel.

If a parameter is specified in both the OPTIONS clause and on the command line when edbldr is invoked, the command line setting is used.

data_file

File containing the data to be loaded into target_table. Each record in the data file corresponds to a row to be inserted into target_table.

If an extension is not provided in the file name, EDB*Loader assumes the file has an extension of .dat, for example, mydatafile.dat.

stdin

Specify stdin (all lowercase letters) if you want to use standard input to pipe the data to be loaded directly to EDB*Loader. This is useful for data sources generating a large number of records to be loaded.

bad_file

File that receives data_file records that cannot be loaded due to errors.

If an extension is not provided in the file name, EDB*Loader assumes the file has an extension of .bad, for example, mybadfile.bad.

Note: If the BAD parameter is specified on the command line when edbldr is invoked, the file given by the command line BAD parameter is used instead.

discard_file

File that receives input data records that are not loaded into any table because none of the selection criteria are met for tables with the WHEN clause, and there are no tables without a WHEN clause. (All records meet the selection criteria of a table without a WHEN clause.)

If an extension is not provided in the file name, EDB*Loader assumes the file has an extension of .dsc, for example, mydiscardfile.dsc.

Note: If the DISCARD parameter is specified on the command line when edbldr is invoked, the file given by the command line DISCARD parameter is used instead.

INSERT | APPEND | REPLACE | TRUNCATE

Specifies how data is to be loaded into the target tables. If one of INSERT, APPEND, REPLACE, or TRUNCATE is specified, it establishes the default action for all tables, overriding the default of INSERT.

INSERT

        Data is to be loaded into an empty table. EDB*Loader throws an exception and does not load any data if the table is not initially empty.

        Note: If the table contains rows, the TRUNCATE command must be used to empty the table prior to invoking EDB*Loader. EDB*Loader throws an exception if the DELETE command is used to empty the table instead of the TRUNCATE command. Oracle SQL*Loader allows the table to be emptied by using either the DELETE or TRUNCATE command.

APPEND

        Data is to be added to any existing rows in the table. The table may be initially empty as well.

REPLACE

        The REPLACE keyword and TRUNCATE keywords are functionally identical. The table is truncated by EDB*Loader prior to loading the new data.

        Note: Delete triggers on the table are not fired as a result of the REPLACE operation.

TRUNCATE

        The table is truncated by EDB*Loader prior to loading the new data. Delete triggers on the table are not fired as a result of the TRUNCATE operation.

target_table

Name of the table into which data is to be loaded. The table name may be schema-qualified (for example, enterprisedb.emp). The specified target must not be a view.

field_condition

Conditional clause taking the following form:

[ ( ] (start:end) { = | != | <> } 'val' [ ) ]

start and end are positive integers specifying the column positions in data_file that mark the beginning and end of a field that is to be compared with the constant val. The first character in each record begins with a start value of 1.

In the WHEN field_condition [ AND field_condition ] clause, if all such conditions evaluate to true for a given record, then EDB*Loader attempts to insert that record into target_table. If the insert operation fails, the record is written to bad_file.

If for a given record, none of the WHEN clauses evaluate to true for all INTO TABLE clauses, the record is written to discard_file, if a discard file was specified for the EDB*Loader session.

termchar

Single character that separates each field in data_file. Two consecutive appearances of termchar with no intervening character results in the corresponding column set to null.

enclchar

Single character used to enclose a field value in data_file. Use enclchar on fields where termchar appears as part of the data.

TRAILING NULLCOLS

If TRAILING NULLCOLS is specified, then the columns in the column list for which there is no data in data_file for a given record, are set to null when the row is inserted. This applies only to one or more consecutive columns at the end of the column list.

If fields are omitted at the end of a record and TRAILING NULLCOLS is not specified, EDB*Loader assumes the record contains formatting errors and writes it to the bad file.

column_name

Name of a column in target_table into which a field value defined by field_def is to be inserted.

CONSTANT val

Specifies a constant that is type-compatible with the column data type to which it is assigned in a field definition. Single or double quotes may enclose val. If val contains white space, then enclosing quotation marks must be used.

The use of the CONSTANT clause completely determines the value to be assigned to a column in each inserted row. No other clause may appear in the same field definition.

If the TERMINATED BY clause is used to delimit the fields in data_file, there must be no delimited field in data_file corresponding to any field definition with a CONSTANT clause. In other words, EDB*Loader assumes there is no field in data_file for any field definition with a CONSTANT clause.

FILLER

Specifies that the data in the field defined by the field definition is not to be loaded into the associated column. The column is set to null.

A column name defined with the FILLER clause must not be referenced in a SQL expression. See the discussion of the expr parameter.

POSITION (start:end)

Defines the location of the field in a record in a fixed-width field data file. start and end are positive integers. The first character in the record has a start value of 1.

CHAR | INTEGER EXTERNAL | FLOAT EXTERNAL | DECIMAL EXTERNAL | ZONED EXTERNAL | DATE [ "datemask" ]

Field type that describes the format of the data field in data_file.

Note: Specification of any of these field types is optional. All data in data_file must be in human-readable, text form. Specification of a field type is for descriptive purposes only and has no effect on whether or not EDB*Loader successfully inserts the data in the field into the table column. Successful loading depends upon the column data type and its compatibility with the field value.

For example, a column with data type NUMBER(7,2) successfully accepts a field containing 2600, but if the field contains a value such as 26XX, the insertion fails and the record is written to bad_file.

datemask

Specifies the ordering and abbreviation of the day, month, and year components of a date field.

See Section 3.5.6 for examples of date masks.

Note: If the DATE field type is specified along with a SQL expression for the column, then datemask must be specified after DATE and before the SQL expression. See the following discussion of the expr parameter.

expr

A SQL expression returning a scalar value that is type-compatible with the column data type to which it is assigned in a field definition. Double quotes must enclose expr. expr may contain a reference to any column in the field list (except for fields with the FILLER clause) by prefixing the column name by a colon character (:).

Examples

The following are some examples of control files and their corresponding data files.

The following control file uses a delimiter-separated data file that appends rows to the emp table:

LOAD DATA
  INFILE    'emp.dat'
    BADFILE 'emp.bad'
  APPEND
  INTO TABLE emp
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
    TRAILING NULLCOLS
  (
    empno,
    ename,
    job,
    mgr,
    hiredate,
    sal,
    deptno,
    comm
  )

In the preceding control file, the APPEND clause is used to allow the insertion of additional rows into the emp table.

The following is the corresponding delimiter-separated data file:

9101,ROGERS,CLERK,7902,17-DEC-10,1980.00,20
9102,PETERSON,SALESMAN,7698,20-DEC-10,2600.00,30,2300.00
9103,WARREN,SALESMAN,7698,22-DEC-10,5250.00,30,2500.00
9104,"JONES, JR.",MANAGER,7839,02-APR-09,7975.00,20

The use of the TRAILING NULLCOLS clause allows the last field supplying the comm column to be omitted from the first and last records. The comm column is set to null for the rows inserted from these records.

The double quotation mark enclosure character surrounds the value JONES, JR. in the last record since the comma delimiter character is part of the field value.

The following query displays the rows added to the table after the EDB*Loader session:

SELECT * FROM emp WHERE empno > 9100;

empno |   ename    |   job    | mgr  |      hiredate      |   sal   |  comm   | deptno
-------+------------+----------+------+--------------------+---------+---------+--------
  9101 | ROGERS     | CLERK    | 7902 | 17-DEC-10 00:00:00 | 1980.00 |         |     20
  9102 | PETERSON   | SALESMAN | 7698 | 20-DEC-10 00:00:00 | 2600.00 | 2300.00 |     30
  9103 | WARREN     | SALESMAN | 7698 | 22-DEC-10 00:00:00 | 5250.00 | 2500.00 |     30
  9104 | JONES, JR. | MANAGER  | 7839 | 02-APR-09 00:00:00 | 7975.00 |         |     20
(4 rows)

The following example is a control file that loads the same rows into the emp table, but uses a data file containing fixed-width fields:

LOAD DATA
  INFILE        'emp_fixed.dat'
    BADFILE     'emp_fixed.bad'
  APPEND
  INTO TABLE emp
    TRAILING NULLCOLS
  (
    empno       POSITION (1:4),
    ename       POSITION (5:14),
    job         POSITION (15:23),
    mgr         POSITION (24:27),
    hiredate    POSITION (28:38),
    sal         POSITION (39:46),
    deptno      POSITION (47:48),
    comm        POSITION (49:56)
  )

In the preceding control file, the FIELDS TERMINATED BY and OPTIONALLY ENCLOSED BY clauses are absent. Instead, each field now includes the POSITION clause.

The following is the corresponding data file containing fixed-width fields:

9101ROGERS    CLERK    790217-DEC-10   1980.0020
9102PETERSON  SALESMAN 769820-DEC-10   2600.0030 2300.00
9103WARREN    SALESMAN 769822-DEC-10   5250.0030 2500.00
9104JONES, JR.MANAGER  783902-APR-09   7975.0020

The following control file illustrates the use of the FILLER clause in the data fields for the sal and comm columns. EDB*Loader ignores the values in these fields and sets the corresponding columns to null.

LOAD DATA
  INFILE        'emp_fixed.dat'
    BADFILE     'emp_fixed.bad'
  APPEND
  INTO TABLE emp
    TRAILING NULLCOLS
  (
    empno       POSITION (1:4),
    ename       POSITION (5:14),
    job         POSITION (15:23),
    mgr         POSITION (24:27),
    hiredate    POSITION (28:38),
    sal         FILLER POSITION (39:46),
    deptno      POSITION (47:48),
    comm        FILLER POSITION (49:56)
  )

Using the same fixed-width data file as in the prior example, the resulting rows in the table appear as follows:

SELECT * FROM emp WHERE empno > 9100;

empno |      ename       |   job    | mgr  |      hiredate      | sal | comm | deptno
-------+------------------+----------+------+--------------------+-----+------+--------
  9101 | ROGERS           | CLERK    | 7902 | 17-DEC-10 00:00:00 |     |      |     20
  9102 | PETERSON         | SALESMAN | 7698 | 20-DEC-10 00:00:00 |     |      |     30
  9103 | WARREN           | SALESMAN | 7698 | 22-DEC-10 00:00:00 |     |      |     30
  9104 | JONES, JR.       | MANAGER  | 7839 | 02-APR-09 00:00:00 |     |      |     20
(4 rows)

The following example illustrates the use of multiple INTO TABLE clauses. For this example, two empty tables are created with the same data definition as the emp table. The following CREATE TABLE commands create these two empty tables, while inserting no rows from the original emp table:

CREATE TABLE emp_research AS SELECT * FROM emp WHERE deptno = 99;
CREATE TABLE emp_sales AS SELECT * FROM emp WHERE deptno = 99;

The following control file contains two INTO TABLE clauses. Also note that there is no APPEND clause so the default operation of INSERT is used, which requires that tables emp_research and emp_sales be empty.

LOAD DATA
  INFILE        'emp_multitbl.dat'
    BADFILE     'emp_multitbl.bad'
    DISCARDFILE 'emp_multitbl.dsc'
  INTO TABLE emp_research
    WHEN (47:48) = '20'
    TRAILING NULLCOLS
  (
    empno       POSITION (1:4),
    ename       POSITION (5:14),
    job         POSITION (15:23),
    mgr         POSITION (24:27),
    hiredate    POSITION (28:38),
    sal         POSITION (39:46),
    deptno      CONSTANT '20',
    comm        POSITION (49:56)
  )
  INTO TABLE emp_sales
    WHEN (47:48) = '30'
    TRAILING NULLCOLS
  (
    empno       POSITION (1:4),
    ename       POSITION (5:14),
    job         POSITION (15:23),
    mgr         POSITION (24:27),
    hiredate    POSITION (28:38),
    sal         POSITION (39:46),
    deptno      CONSTANT '30',
    comm        POSITION (49:56) "ROUND(:comm + (:sal * .25), 0)"
  )

The WHEN clauses specify that when the field designated by columns 47 thru 48 contains 20, the record is inserted into the emp_research table and when that same field contains 30, the record is inserted into the emp_sales table. If neither condition is true, the record is written to the discard file named emp_multitbl.dsc.

The CONSTANT clause is given for column deptno so the specified constant value is inserted into deptno for each record. When the CONSTANT clause is used, it must be the only clause in the field definition other than the column name to which the constant value is assigned.

Finally, column comm of the emp_sales table is assigned a SQL expression. Column names may be referenced in the expression by prefixing the column name with a colon character (:).

The following is the corresponding data file:

9101ROGERS    CLERK    790217-DEC-10   1980.0020
9102PETERSON  SALESMAN 769820-DEC-10   2600.0030 2300.00
9103WARREN    SALESMAN 769822-DEC-10   5250.0030 2500.00
9104JONES, JR.MANAGER  783902-APR-09   7975.0020
9105ARNOLDS   CLERK    778213-SEP-10   3750.0010
9106JACKSON   ANALYST  756603-JAN-11   4500.0040

Since the records for employees ARNOLDS and JACKSON contain 10 and 40 in columns 47 thru 48, which do not satisfy any of the WHEN clauses, EDB*Loader writes these two records to the discard file, emp_multitbl.dsc, whose content is shown by the following:

9105ARNOLDS   CLERK    778213-SEP-10   3750.0010
9106JACKSON   ANALYST  756603-JAN-11   4500.0040

The following are the rows loaded into the emp_research and emp_sales tables:

SELECT * FROM emp_research;

empno |   ename    |   job   | mgr  |      hiredate      |   sal   | comm | deptno
-------+------------+---------+------+--------------------+---------+------+--------
  9101 | ROGERS     | CLERK   | 7902 | 17-DEC-10 00:00:00 | 1980.00 |      |  20.00
  9104 | JONES, JR. | MANAGER | 7839 | 02-APR-09 00:00:00 | 7975.00 |      |  20.00
(2 rows)

SELECT * FROM emp_sales;

empno |  ename   |   job    | mgr  |      hiredate      |   sal   |  comm   | deptno
-------+----------+----------+------+--------------------+---------+---------+--------
  9102 | PETERSON | SALESMAN | 7698 | 20-DEC-10 00:00:00 | 2600.00 | 2950.00 |  30.00
  9103 | WARREN   | SALESMAN | 7698 | 22-DEC-10 00:00:00 | 5250.00 | 3813.00 |  30.00
(2 rows)

11.2.4 Invoking EDB*Loader

You must have superuser privileges to run EDB*Loader. Use the following command to invoke EDB*Loader from the command line:

edbldr [ -d dbname ] [ -p port ] [ -h host ]
[ USERID={ username/password | username/ | username | / } ]
  CONTROL=control_file
[ BAD=bad_file ]
[ DISCARD=discard_file ]
[ LOG=log_file ]
[ PARFILE=param_file ]
[ SKIP=skip_count ]
[ ERRORS=error_count ]
[ SKIP_INDEX_MAINTENANCE={ FALSE | TRUE }
[ DIRECT={ FALSE | TRUE } ]
[ PARALLEL={ FALSE | TRUE } ]

Description

If the -d option, the -p option, or the -h option are omitted, the defaults for the database, port, and host are determined according to the same rules as other Postgres Plus Advanced Server utility programs such as edb-psql, for example.

Any parameter listed in the preceding syntax diagram except for the -d option, -p option, -h option, and the PARFILE parameter may be specified in a parameter file. The parameter file is specified on the command line when edbldr is invoked using PARFILE=param_file. Some parameters may be specified in the OPTIONS clause in the control file. See the description of the OPTIONS clause in Section 11.2.3.

The specification of control_file, bad_file, discard_file, log_file, and param_file may include the full directory path or a relative directory path to the file name. If the file name is specified alone or with a relative directory path, the file is assumed to exist (in the case of control_file or param_file), or to be created (in the case of bad_file, discard_file, or log_file) relative to the current working directory from which edbldr is invoked.

The operating system account used to invoke edbldr must have read permission on the directories and files specified by control_file and param_file.

The operating system account enterprisedb must have write permission on the directories where bad_file, discard_file, and log_file are to be written.

Note: It is suggested that the file names for control_file, bad_file, discard_file, and log_file include extensions of .ctl, .bad, .dsc, and .log, respectively. If the provided file name does not contain an extension, EDB*Loader assumes the actual file name includes the appropriate aforementioned extension.

Parameters

dbname

Name of the database containing the tables to be loaded.

port

Port number on which the database server is accepting connections.

host

IP address of the host on which the database server is running.

USERID={ username/password | username/ | username | / }

EDB*Loader connects to the database with username. username must be a superuser. password is the password for username.

If the USERID parameter is omitted, EDB*Loader prompts for username and password. If USERID=username/ is specified, the EDB*Loader 1) uses the password file specified by environment variable PGPASSFILE if PGPASSFILE is set, or 2) uses the .pgpass password file (pgpass.conf on Windows systems) if PGPASSFILE is not set. If USERID=username is specified, then EDB*Loader prompts for password. If USERID=/ is specified, the connection is attempted using the operating system account as the user name.

Note: The Postgres Plus Advanced Server connection environment variables PGUSER and PGPASSWORD are ignored by EDB*Loader. See the PostgreSQL core documentation for information on the PGPASSFILE environment variable and the password file.

control_file

Control file containing the directives to be used by EDB*Loader. If a file extension is not provided, an extension of .ctl is assumed. See Section 11.2.3 for a description of the control file.

bad_file

File that receives input data records that cannot be loaded due to errors. See the description of bad_file in Section 11.2.3.

Note: The specification of bad_file using the command line BAD parameter overrides the BADFILE clause in the control file.

discard_file

File that receives input data records that do not meet any table’s selection criteria. See the description of discard_file in Section 11.2.3.

Note: The specification of discard_file using the command line DISCARD parameter overrides the DISCARDFILE clause in the control file.

log_file

File in which EDB*Loader records the results of the EDB*Loader session.

If the LOG parameter is omitted, EDB*Loader creates a log file with the name control_file_base.log in the current working directory from which edbldr is invoked. control_file_base is the base name of the control file used in the EDB*Loader session. The operating system account enterprisedb must have write permission on the directory where the log file is to be written.

param_file

Parameter file which EDB*Loader uses to obtain command line parameters for the EDB*Loader session. Any command line parameter listed in this section except for the -d, -p, and -h options, and the PARFILE parameter itself, can be specified in param_file instead of on the command line.

Any parameter given in param_file overrides the same parameter supplied on the command line before the PARFILE option. Any parameter given on the command line that appears after the PARFILE option overrides the same parameter given in param_file.

Note: Unlike other EDB*Loader files, there is no default file name or extension assumed for param_file, though by Oracle SQL*Loader convention, .par is typically used, but not required, as an extension.

skip_count

Number of records at the beginning of the input data file that should be skipped before loading begins. The default is 0.

error_count

Number of errors permitted before aborting the EDB*Loader session. The default is 50.

SKIP_INDEX_MAINTENANCE= { FALSE | TRUE }

If set to TRUE, index maintenance is not performed as part of a direct path load. Indexes on the loaded table are marked as invalid. Use the REINDEX command to rebuild the indexes. The default is FALSE.

DIRECT= { FALSE | TRUE }

If set to TRUE, a direct path load is performed instead of a conventional path load. See Section 11.2.5 for information on direct path loads. The default is FALSE.

PARALLEL= { FALSE | TRUE }

If set to TRUE, it is assumed this EDB*Loader session is one of a number of concurrent EDB*Loader sessions participating in a parallel direct path load. The DIRECT=TRUE parameter must also be set when PARALLEL=TRUE. See Section 11.2.6 for information on parallel direct path loads. The default is FALSE.

Examples

In the following example EDB*Loader is invoked using a control file named emp.ctl located in the current working directory to load a table in database edb:

$ /opt/PostgresPlus/9.0AS/bin/edbldr -d edb USERID=enterprisedb/password CONTROL=emp.ctl
EDB*Loader: Copyright (c) 2007-2010, EnterpriseDB Corporation.

Successfully loaded (4) records

In the following example, EDB*Loader prompts for the user name and password since they are omitted from the command line. In addition, the files for the bad file and log file are specified with the BAD and LOG command line parameters.

$ /opt/PostgresPlus/9.0AS/bin/edbldr -d edb CONTROL=emp.ctl BAD=/tmp/emp.bad LOG=/tmp/emp.log
Enter the user name : enterprisedb
Enter the password :
EDB*Loader: Copyright (c) 2007-2010, EnterpriseDB Corporation.

Successfully loaded (4) records

The following example runs EDB*Loader with the same parameters as shown in the preceding example, but using a parameter file located in the current working directory. The SKIP and ERRORS parameters are altered from their defaults in the parameter file as well.

The parameter file, emp.par, contains the following:

CONTROL=emp.ctl
BAD=/tmp/emp.bad
LOG=/tmp/emp.log
SKIP=1
ERRORS=10

EDB*Loader is invoked with the parameter file as shown by the following:

$ /opt/PostgresPlus/9.0AS/bin/edbldr -d edb PARFILE=emp.par
Enter the user name : enterprisedb
Enter the password :
EDB*Loader: Copyright (c) 2007-2010, EnterpriseDB Corporation.

Successfully loaded (3) records

11.2.5 Direct Path Load

During a direct path load, EDB*Loader writes the data directly to the database pages, which is then synchronized to disk. The insert processing associated with a conventional path load is bypassed, thereby resulting in a performance improvement.

However, bypassing insert processing reduces the types of constraints that may exist on the target table.

The following types of constraints are permitted on the target table of a direct path load:

    ● Primary key

    ● Not null constraints

    ● Indexes (unique or non-unique)

The restrictions on the target table of a direct path load are the following:

    ● Triggers are not permitted

    ● Check constraints are not permitted

    ● Foreign key constraints on the target table referencing another table are not permitted

    ● Foreign key constraints on other tables referencing the target table are not permitted

    ● The table must not be partitioned

    ● Rules may exist on the target table, but they are not executed

Note: Currently, a direct path load in EDB*Loader is more restrictive than in Oracle SQL*Loader. The preceding restrictions do not apply to Oracle SQL*Loader in most cases.

In addition, the following restrictions apply to a control file used in a direct path load:

    ● Multiple table loads are not supported. That is, only one INTO TABLE clause may be specified in the control file.

    ● SQL expressions may not be used in the data field definitions of the INTO TABLE clause.

To run a direct path load, add the DIRECT=TRUE option as shown by the following example:

$ /opt/PostgresPlus/9.0AS/bin/edbldr -d edb USERID=enterprisedb/password CONTROL=emp.ctl DIRECT=TRUE
EDB*Loader: Copyright (c) 2007-2010, EnterpriseDB Corporation.

Successfully loaded (4) records

11.2.6 Parallel Direct Path Load

The performance of a direct path load can be further improved by distributing the loading process over two or more sessions running concurrently. Each session runs a direct path load into the same table.

Since the same table is loaded from multiple sessions, the input records to be loaded into the table must be divided amongst several data files so that each EDB*Loader session uses its own data file and the same record is not loaded more than once into the table.

The target table of a parallel direct path load is under the same restrictions as a direct path load run in a single session. See Section 11.2.5 for the restrictions on direct path loads.

In addition, the APPEND clause must be specified in the control file used by each EDB*Loader session.

To run a parallel direct path load, run EDB*Loader in a separate session for each participant of the parallel direct path load. Invocation of each such EDB*Loader session must include the DIRECT=TRUE and PARALLEL=TRUE parameters.

Each EDB*Loader session runs as an independent transaction so if one of the parallel sessions aborts and rolls back its changes, the loading done by the other parallel sessions are not affected.

Note: In a parallel direct path load, each EDB*Loader session reserves a fixed number of blocks in the target table in a round-robin fashion. Some of the blocks in the last allocated chunk may not be used, and those blocks remain uninitialized. A subsequent use of the VACUUM command on the target table may show warnings regarding these uninitialized blocks such as the following:

WARNING:  relation "emp" page 98264 is uninitialized --- fixing

WARNING:  relation "emp" page 98265 is uninitialized --- fixing

WARNING:  relation "emp" page 98266 is uninitialized --- fixing

This is an expected behavior and does not indicate data corruption.

Indexes on the target table are not updated during a parallel direct path load and are therefore marked as invalid after the load is complete. You must use the REINDEX command to rebuild the indexes.

The following example shows the use of a parallel direct path load on the emp table.

Note: If you attempt a parallel direct path load on the sample emp table provided with Postgres Plus Advanced Server, you must first remove the triggers and constraints referencing the emp table. In addition the primary key column, empno, was expanded from NUMBER(4) to NUMBER in this example to allow for the insertion of a larger number of rows.

The following is the control file used in the first session:

LOAD DATA
  INFILE    '/home/user/loader/emp_parallel_1.dat'
  APPEND
  INTO TABLE emp
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
    TRAILING NULLCOLS
  (
    empno,
    ename,
    job,
    mgr,
    hiredate,
    sal,
    deptno,
    comm
  )

The APPEND clause must be specified in the control file for a parallel direct path load.

The following shows the invocation of EDB*Loader in the first session. The DIRECT=TRUE and PARALLEL=TRUE parameters must be specified.

$ /opt/PostgresPlus/9.0AS/bin/edbldr -d edb USERID=enterprisedb/password CONTROL=emp_parallel_1.ctl DIRECT=TRUE PARALLEL=TRUE
WARNING:  index maintenance will be skipped with PARALLEL load
EDB*Loader: Copyright (c) 2007-2010, EnterpriseDB Corporation.

The control file used for the second session appears as follows. Note that it is the same as the one used in the first session, but uses a different data file.

LOAD DATA
  INFILE    '/home/user/loader/emp_parallel_2.dat'
  APPEND
  INTO TABLE emp
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
    TRAILING NULLCOLS
  (
    empno,
    ename,
    job,
    mgr,
    hiredate,
    sal,
    deptno,
    comm
  )

The preceding control file is used in a second session as shown by the following:

$ /opt/PostgresPlus/9.0AS/bin/edbldr -d edb USERID=enterprisedb/password CONTROL=emp_parallel_2.ctl DIRECT=TRUE PARALLEL=TRUE
WARNING:  index maintenance will be skipped with PARALLEL load
EDB*Loader: Copyright (c) 2007-2010, EnterpriseDB Corporation.

EDB*Loader displays the following message in each session when its respective load operation completes:

Successfully loaded (10000) records

The following query shows that the index on the emp table has been marked as INVALID:

SELECT index_name, status FROM user_indexes WHERE table_name = 'EMP';

 index_name | status
------------+---------
 EMP_PK     | INVALID
(1 row)

Note: user_indexes is the Oracle compatible view of indexes owned by the current user.

Queries on the emp table will not utilize the index unless it is rebuilt using the REINDEX command as shown by the following:

REINDEX INDEX emp_pk;

A subsequent query on user_indexes shows that the index is now marked as VALID:

SELECT index_name, status FROM user_indexes WHERE table_name = 'EMP';

 index_name | status
------------+--------
 EMP_PK     | VALID
(1 row)

11.2.7 Remote Loading

EDB*Loader supports a feature called remote loading. In remote loading, the database containing the table to be loaded is running on a database server on a different host than from where EDB*Loader is invoked with the input data source.

This feature is useful if you have a large amount of data to be loaded, and you do not want to create a large data file on the host running the database server.

In addition, you can use the standard input feature to pipe the data from the data source such as another program or script, directly to EDB*Loader, which then loads the table in the remote database. This bypasses the process of having to create a data file on disk for EDB*Loader.

Performing remote loading along with using standard input requires the following:

    ● The edbldr program must be installed on the client host on which it is to be invoked with the data source for the EDB*Loader session.

    ● The control file must contain the clause INFILE 'stdin' so you can pipe the data directly into EDB*Loader’s standard input. See Section 11.2.3 for information on the INFILE clause and the EDB*Loader control file.

    ● All files used by EDB*Loader such as the control file, bad file, discard file, and log file must reside on, or are created on, the client host on which edbldr is invoked.

    ● When invoking EDB*Loader, use the -h option to specify the IP address of the remote database server. See Section 11.2.4 for information on invoking EDB*Loader.

    ● Use the operating system pipe operator (|) or input redirection operator (<) to supply the input data to EDB*Loader.

The following example loads a database running on a database server at 192.168.1.14 using data piped from a source named datasource.

datasource | ./edbldr -d edb -h 192.168.1.14 USERID=enterprisedb/password CONTROL=remote.ctl

The following is another example of how standard input can be used:

./edbldr -d edb -h 192.168.1.14 USERID=enterprisedb/password CONTROL=remote.ctl < datasource

Previous PageTable Of ContentsNext Page