Controlling schema ingestion Innovation Release

This documentation covers the current Innovation Release of EDB Postgres AI. See also:

Hybrid Manager dual release strategy
Documentation for the current Long-term support release

The beacon-agent.yaml monitoring template allows the Agent to ingest all schemas that the migration user has permissions to and contain at least one table.

You can add further configurations to control schema ingestion.

Defining a subset of schemas for ingestion

If you don't want the Agent to ingest all schemas, you can define a subset with the databases.schema.filter option.

Using the schema filter

To control which schemas the Agent ingests from your database, add the filter parameter within the databases.schema configuration block in your beacong-agent.yaml. This allows you to specify a set of schemas to either include or exclude.

The filter parameter requires two settings:

mode: Defines the filtering behavior. It must be set to either include or exclude. When set to include, the Agent only ingests schemas listed in the names list. When set to exclude, the Agent ingests all schemas except those listed in the names list.
names: A list of schema names to apply the filter to. Because the names are case-sensitive, ensure you use the same casing for the names as how the schemas are stored in the source database.

Note

The Agent has a default list of system schemas for both Oracle and Postgres that are automatically excluded from ingestion because they are irrelevant for the database migration. When an exclude filter is used, this default list is combined with any additional schemas you specify. For the include filter, the Agent will ingest any schemas you explicitly name, including system schemas that would have otherwise been excluded by default.

Include example

[...]
provider:
  onprem:
    clusters:
      - resource_id: "<cluster_resource_id>"
        [...]
        nodes:
          - resource_id: "<node_resource_id>"
            dsn: $DSN
            schema:
              enabled: true
              poll_interval: 15s 
              filter:
                mode: "include"
                names: 
                  - "<schema_name1>"
                  - "<schema_name2>"
            tags:
              - "<tag_names>"

Exclude example

[...]
provider:
  onprem:
    clusters:
      - resource_id: "<cluster_resource_id>"
       [..]
       nodes:
         - resource_id: "<node_resource_id>"
           dsn: $DSN
           schema:
             enabled: true
             poll_interval: 15s 
             filter:
               mode: "exclude"
               names: 
                 - "<schema_name1>"
                 - "<schema_name2>"
           tags:
             - "<tag_names>"

Excluded system schemas

The Agent excludes the following system schemas by default.

Postgres and EDB Postgres Extended Server

information_schema
pg_catalog
pg_toast

EDB Postgres Advanced Server

Oracle

ANONYMOUS
APEX_PUBLIC_USER
APEX_030200
APEX_040000
APEX_040200
APPQOSSYS
AUDSYS
CTXSYS
DMSYS
DBSNMP
DBSFWUSER
DEMO
DIP
DVF
DVSYS
EXFSYS
FLOWS_FILES
FLOWS_020100
FRANCK
GGSYS
GSMADMIN_INTERNAL
GSMCATUSER
GSMROOTUSER
GSMUSER
LBACSYS
MDDATA
MDSYS
MGMT_VIEW
OJVMSYS
OLAPSYS
ORDPLUGINS
ORDSYS
ORDDATA
OUTLN
ORACLE_OCM
OWBSYS
OWBYSS_AUDIT
PDBADMIN
RMAN
REMOTE_SCHEDULER_AGENT
SI_INFORMTN_SCHEMA
SPATIAL_CSW_ADMIN_USR
SPATIAL_WFS_ADMIN_USR
SQLTXADMIN
SQLTXPLAIN
SYS
SYSBACKUP
SYSDG
SYSKM
SYSRAC
SYSTEM
SYSMAN
TSMSYS
WKPROXY
WKSYS
WK_TEST
WMSYS
XDB
XS$NULL

Performing multiple concurrent schema ingestions

When you configure the EDB Postgres AI agent to ingest schemas from several Oracle databases, you can use the schema_export_max_workers parameter of the beacon-agent.yaml configuration file to control how many schema export jobs the EDB Postgres AI agent runs in parallel.

The schema_export_max_workers sets the maximum number of concurrent schema exports the EDB Postgres AI agent runs across all configured Oracle databases. Because active workers consume CPU, memory, and some disk space on the auxiliary machine that runs EDB Postgres AI agent, you can use this parameter to tune resource consumption, while speeding up or slowing down the bulk schema ingestion.

The schema_export_max_workers is in the provider.onprem section of the beacon-agent.yaml:

... # other beacon-agent.yaml configurations
provider:
  onprem:
    runner:
      enabled: true
    schema_export_max_workers: 10
...

Using the max workers parameter

Start with the default schema_export_max_workers: 10 when configuring initial schema ingestion. Then, monitor the auxiliary machine running the EDB Postgres AI agent (CPU, memory, disk I/O, network).

If the auxiliary machine is underutilized and you need faster ingestion for multiple databases, or if you notice that the databases are taking too long to appear in the Estate > Migrations tab, increase the value and observe the impact.

If resource usage is consistently high or causing contention (for example, you notice spikes in CPU, memory, and disk usage), decrease the value and confirm that the system stabilizes. This slows down total ingestion time but reduces the peak resource usage on the EDB Postgres AI agent host.

After each change, restart the Agent if it is not running as a service with automatic reload.

← Prev

Configuring multiple databases

↑ Up

Configuring the EDB Postgres AI agent

Adding usage reporting for Hybrid Manager-monitored databases