Running a parallel direct path load v16
You can further improve the performance of a direct path load by distributing the loading process over two or more sessions running concurrently. Each session runs a direct path load into the same table.
Since the same table is loaded from multiple sessions, the input records to load into the table must be divided among several data files. This way, each EDB*Loader session uses its own data file, and the same record isn't loaded into the table more than once.
The target table of a parallel direct path load is under the same restrictions as a direct path load run in a single session.
The restrictions on the target table of a direct path load are:
- Triggers aren't permitted.
- Check constraints aren't permitted.
- Foreign key constraints on the target table referencing another table aren't permitted.
- Foreign key constraints on other tables referencing the target table aren't permitted.
- You must not partition the table.
- Rules can exist on the target table, but they aren't executed.
In addition, you must specify the
APPEND clause in the control file used by each EDB*Loader session.
To run a parallel direct path load, run EDB*Loader in a separate session for each participant of the parallel direct path load. You must include the
PARALLEL=TRUE parameters when invoking each such EDB*Loader session.
Each EDB*Loader session runs as an independent transaction. Aborting and rolling back changes of one of the parallel sessions doesn't affect the loading done by the other parallel sessions.
In a parallel direct path load, each EDB*Loader session reserves a fixed number of blocks in the target table using turns. Some of the blocks in the last allocated chunk might not be used, and those blocks remain uninitialized. A later use of the
VACUUM command on the target table might show warnings about these uninitialized blocks, such as the following:
This behavior is expected and doesn't indicate data corruption.
Indexes on the target table aren't updated during a parallel direct path load. They are therefore marked as invalid after the load is complete. You must use the
REINDEX command to rebuild the indexes.
This example shows the use of a parallel direct path load on the
If you attempt a parallel direct path load on the sample
emp table provided with EDB Postgres Advanced Server, you must first remove the triggers and constraints referencing the
emp table. In addition, the primary key column,
empno, was expanded from
NUMBER in this example to allow for inserting more rows.
This is the control file used in the first session:
You must specify the
APPEND clause in the control file for a parallel direct path load.
This example invokes EDB*Loader in the first session. You must specify the
The control file used for the second session appears as follows. It's the same as the one used in the first session, but it uses a different data file.
This control file is used in a second session:
EDB*Loader displays a message in each session when the load operation completes:
This query shows that the index on the emp table was marked
user_indexes is the view of indexes compatible with Oracle databases owned by the current user.
Queries on the
emp table don't use the index unless you rebuild it using the
A later query on
user_indexes shows that the index is now marked as