Create a pipeline Innovation Release

A pipeline defines how data moves from a source, through one or more transformation steps, to an AI-ready destination. Creating a pipeline is the first step in making your data searchable — without one, there is no process to generate embeddings or keep your knowledge base in sync with your source data.

Use aidb.create_pipeline() to define a pipeline. Its parameters fall into four groups:

Parameter groupParametersMore information
Sourcesource, source_key_column, source_data_columnThis page
Stepsstep_1step_10, step_N_optionsPipeline steps
Orchestrationauto_processing, background_sync_interval, batch_sizeOrchestration
Destinationdestination, destination_key_column, destination_data_columnSet automatically by the KnowledgeBase step, or specify explicitly

Table sources

To read from a Postgres table, set the source parameter to the table name, source_key_column to the unique key column, and source_data_column to the column containing the data to process:

SELECT aidb.create_pipeline(
    name               => 'my_pipeline',
    source             => 'my_table',
    source_key_column  => 'id',
    source_data_column => 'content',
    step_1             => 'KnowledgeBase',
    step_1_options     => aidb.knowledge_base_config('my_model', 'Text')
);

Volume sources (PGFS)

To process files stored in external cloud storage (S3, GCS, or Azure), pipelines use Postgres File System (PGFS). PGFS mounts external object storage as a volume that the pipeline can scan for files.

Step 1: Create a storage location

Define the external storage connection using pgfs.create_storage_location:

SELECT pgfs.create_storage_location(
    name     => 'my_s3_location',
    uri      => 's3://my-bucket/my-folder',
    options  => '{"region": "us-east-1"}'
);

Step 2: Create a volume

Create an AIDB volume that references the storage location. The pipeline will scan this volume for new or changed files:

SELECT aidb.create_volume(
    name             => 'my_volume',
    storage_location => 'my_s3_location'
);

Step 3: Reference the volume as the pipeline source

Set the source parameter to the volume name:

SELECT aidb.create_pipeline(
    name       => 'my_pdf_pipeline',
    source     => 'my_volume',
    step_1     => 'ParsePdf',
    step_2     => 'KnowledgeBase',
    step_2_options => aidb.knowledge_base_config('my_model', 'Text')
);

For full PGFS reference, see PGFS documentation.