Controlling schema ingestion Innovation Release

The beacon-agent.yaml monitoring template allows the Agent to ingest all schemas that the migration user has permissions to and contain at least one table.

You can add further configurations to control schema ingestion.

Defining a subset of schemas for ingestion

If you don't want the Agent to ingest all schemas, you can define a subset with the databases.schema.filter option.

Using the schema filter

To control which schemas the Agent ingests from your database, add the filter parameter within the databases.schema configuration block in your beacong-agent.yaml. This allows you to specify a set of schemas to either include or exclude.

The filter parameter requires two settings:

  • mode: Defines the filtering behavior. It must be set to either include or exclude. When set to include, the Agent only ingests schemas listed in the names list. When set to exclude, the Agent ingests all schemas except those listed in the names list.

  • names: A list of schema names to apply the filter to. Because the names are case-sensitive, ensure you use the same casing for the names as how the schemas are stored in the source database.

Note

The Agent has a default list of system schemas for both Oracle and Postgres that are automatically excluded from ingestion because they are irrelevant for the database migration. When an exclude filter is used, this default list is combined with any additional schemas you specify. For the include filter, the Agent will ingest any schemas you explicitly name, including system schemas that would have otherwise been excluded by default.

Include example

[...]
provider:
  onprem:
    clusters:
      - resource_id: "<cluster_resource_id>"
        [...]
        nodes:
          - resource_id: "<node_resource_id>"
            dsn: $DSN
            schema:
              enabled: true
              poll_interval: 15s 
              filter:
                mode: "include"
                names: 
                  - "<schema_name1>"
                  - "<schema_name2>"
            tags:
              - "<tag_names>"

Exclude example

[...]
provider:
  onprem:
    clusters:
      - resource_id: "<cluster_resource_id>"
       [..]
       nodes:
         - resource_id: "<node_resource_id>"
           dsn: $DSN
           schema:
             enabled: true
             poll_interval: 15s 
             filter:
               mode: "exclude"
               names: 
                 - "<schema_name1>"
                 - "<schema_name2>"
           tags:
             - "<tag_names>"

Excluded system schemas

The Agent excludes the following system schemas by default.

Postgres and EDB Postgres Extended Server

  • information_schema
  • pg_catalog
  • pg_toast

EDB Postgres Advanced Server

  • sys

Oracle

  • ANONYMOUS
  • APEX_PUBLIC_USER
  • APEX_030200
  • APEX_040000
  • APEX_040200
  • APPQOSSYS
  • AUDSYS
  • CTXSYS
  • DMSYS
  • DBSNMP
  • DBSFWUSER
  • DEMO
  • DIP
  • DVF
  • DVSYS
  • EXFSYS
  • FLOWS_FILES
  • FLOWS_020100
  • FRANCK
  • GGSYS
  • GSMADMIN_INTERNAL
  • GSMCATUSER
  • GSMROOTUSER
  • GSMUSER
  • LBACSYS
  • MDDATA
  • MDSYS
  • MGMT_VIEW
  • OJVMSYS
  • OLAPSYS
  • ORDPLUGINS
  • ORDSYS
  • ORDDATA
  • OUTLN
  • ORACLE_OCM
  • OWBSYS
  • OWBYSS_AUDIT
  • PDBADMIN
  • RMAN
  • REMOTE_SCHEDULER_AGENT
  • SI_INFORMTN_SCHEMA
  • SPATIAL_CSW_ADMIN_USR
  • SPATIAL_WFS_ADMIN_USR
  • SQLTXADMIN
  • SQLTXPLAIN
  • SYS
  • SYSBACKUP
  • SYSDG
  • SYSKM
  • SYSRAC
  • SYSTEM
  • SYSMAN
  • TSMSYS
  • WKPROXY
  • WKSYS
  • WK_TEST
  • WMSYS
  • XDB
  • XS$NULL

Performing multiple concurrent schema ingestions

When you configure the EDB Postgres AI agent to ingest schemas from several Oracle databases, you can use the schema_export_max_workers parameter of the beacon-agent.yaml configuration file to control how many schema export jobs the EDB Postgres AI agent runs in parallel.

The schema_export_max_workers sets the maximum number of concurrent schema exports the EDB Postgres AI agent runs across all configured Oracle databases. Because active workers consume CPU, memory, and some disk space on the auxiliary machine that runs EDB Postgres AI agent, you can use this parameter to tune resource consumption, while speeding up or slowing down the bulk schema ingestion.

The schema_export_max_workers is in the provider.onprem section of the beacon-agent.yaml:

... # other beacon-agent.yaml configurations
provider:
  onprem:
    runner:
      enabled: true
    schema_export_max_workers: 10
...

Using the max workers parameter

Start with the default schema_export_max_workers: 10 when configuring initial schema ingestion. Then, monitor the auxiliary machine running the EDB Postgres AI agent (CPU, memory, disk I/O, network).

If the auxiliary machine is underutilized and you need faster ingestion for multiple databases, or if you notice that the databases are taking too long to appear in the Estate > Migrations tab, increase the value and observe the impact.

If resource usage is consistently high or causing contention (for example, you notice spikes in CPU, memory, and disk usage), decrease the value and confirm that the system stabilizes. This slows down total ingestion time but reduces the peak resource usage on the EDB Postgres AI agent host.

After each change, restart the Agent if it is not running as a service with automatic reload.