WarehousePG Copy configuration file
This is the TOML-based configuration file reference for WarehousePG Copy TOML-based.
Note
Command-line arguments take precedence over settings defined in a TOML file.
You can generate a sample configuration file using the whpg-copy config-example command.
# ============================================================================== # whpg-copy Configuration Example # ============================================================================== # Source database connection URL. # Supports standard PostgreSQL connection strings. src_url = "postgres://gpadmin:@10.0.0.1:5432/source_db" # Destination database connection URL. dst_url = "postgres://gpadmin:@10.0.0.2:5432/target_db" # Tables to include in the copy operation. To have uppercase letters or other # special characters in schema or table names, follow PostgreSQL's qualified # identifier rules to quote them. # Default: None include_table = [ "public.users", "\"Inventory\".\"StockItems\"" ] # Tables to exclude. # Default: None exclude_table = [ "public.temp_cache", "sales.test_data" ] # Enable compression during data transfer to reduce network bandwidth usage. # Recommended for transfers over WAN or slow networks. # Default: true compression = true # Copy partitioned tables through their leaf partitions in parallel. # Disable this to enforce data goes through the root partition table or # intermediate partition table. # Default: true through_partition_leaves = true # How to handle existing tables on the destination: # "append" : Insert data into existing tables. # "truncate" : Clear the destination table before copying. # "skip-existing" : Do not copy if the table already exists. # Default: append target_mode = "append" # Validation method to perform after copying data: # "none" : No validation. # "count" : Compare row counts between source and destination. # "checksum" : Calculate and compare data hashes. # Default: none validate_method = "count" # Number of parallel workers to run. # Default: 4 workers = 4 # Listening port range on the destination for data transferring. Those # ports need to be enabled to be accessed from the source segments. # whpg-copy will try to start listening on the port one by one from # the range. At least one port is required in the range. port_range = "60000-60001" # Define rules to rename schemas or tables during the copy process. # Each rule requires at least a source pattern. # # public.sales -> new_schema.sales [[mapping_rules]] src_table= "sales" dst_schema = "new_schema" # old_schema.raw_logs -> new_schema.processed_logs [[mapping_rules]] src_schema = "old_schema" src_table = "raw_logs" dst_schema = "new_schema" dst_table = "processed_logs" [[mapping_rules]] src_schema = "old_schema(\\d+)" dst_schema = "new_schema${1}" src_table = "old_table(\\d+)" dst_table = "new_table${1}" # Run the operation without actually changing any data on the destination. dry_run = false
Keywords and values
src_url
Connection string for the source database. Supports standard PostgreSQL connection strings. It follows the format postgres://[user@]host[:port][/dbname].
dst_url
Connection string for the destination database. It follows the format postgres://[user@]host[:port][/dbname].
include_table
Specifies tables to include. Use the format schema.table to specify the relations. If you are using special characters, follow PostgreSQL's qualified identifier rules to quote them.
exclude_table
Specifies tables to exclude. Uses the same format as include-table.
compression
Enables or disables ZSTD compression during data transfer. Recommended for transfers over WAN or slow networks. Default is true.
through_partition_leaves
If true (default), copies data directly between leaf partitions in parallel. If false, data goes through the specified root/intermediate partition table.
target_mode
Determines how to handle existing tables on the destination. The supported options are:
- append: (Default) Inserts data into existing tables.
- truncate: Truncates the destination table before copying.
- skip-existing : Skips the copy operation if the table already exists.
validate_method
Validation to perform after copying. The supported options are:
- none: (Default) No validation.
- count: Compares row counts.
- checksum: Calculates and compares data hashes.
workers
Specifies the number of concurrent worker tasks. Default is 4.
port_range
Defines the ports on the destination cluster used to receive data from the source segments. whpg-copy scans this range sequentially and binds to the first available port it finds. You must specify at least one port and ensure the entire range is accessible from the source segment hosts.
mapping_rules
Mapping rules allow for powerful renaming and selection logic using regular expressions (Regex). You can define multiple [[mapping_rules]] blocks in your configuration file. The supported options are:
- src_schema, scr_table: Define the source objects. These fields use standard Rust regular expression patterns to match your source objects. Patterns are automatically anchored (wrapped in
^and$). For example,src_table = "users"matches only the table"users", not"super_users". To match multiple tables, use the.*wildcard. - dst_schema, dst_table: Define the destination objects. These fields support Regex's Capture Groups. If your source pattern contains groups in parentheses
(), you can reference them in the destination using${1},${2}, etcetera. - sql: Custom SQL query to use for extracting data from the source table. Instead of copying the entire table,
whpg-copywill execute this SQL and copy its result. Supports placeholders:${src_schema}and${src_table}, the utility will automatically replace them with the escaped source objects. This is ideal for joining tables, masking sensitive data, or changing data types on the fly.
Note
If your mapping rule involves a rename (the destination schema or table name is different from the source), whpg-copy cannot automatically create the table on the destination cluster. You must ensure the destination table exists with the correct schema before initiating the copy.
dry_run
Run the operation without changing any data on the destination (default is false).
Examples
Copy the table
public.usersto a table namedpublic.customers:[[mapping_rules]] src_schema = "public" src_table = "users" dst_table = "customers"
Copy all tables in schema
legacyto the schemaarchived:[[mapping_rules]] src_schema = "legacy" src_table = ".*" # Match all tables in 'legacy' schema dst_schema = "archived"
Copy using Capture Groups to dynamically rename tables during the copy operation.
By using parentheses
()in yoursrc_tablepattern, you can "save" parts of the table name and "paste" them into the new name.[[mapping_rules]] src_table = "data_(\\d+)_(\\d+)" dst_table = "record_${1}_v${2}"The
src_tablepattern looks for tables starting withdata_, followed by two groups of digits.The
dst_tabletemplate defines how the new table should be named using the saved groups.As a result, a table named
data_2023_01will be copied asrecord_2023_v01.
Use a custom SQL only to copy orders from 2024 onwards:
[[mapping_rules]] src_table = "orders" sql = "SELECT * FROM ${src_schema}.${src_table} WHERE order_date >= '2024-01-01'"
- On this page
- Keywords and values
- Examples
Could this page be better? Report a problem or suggest an addition!