Pipelines API reference v7

This page covers the full public API for AI pipelines: the types and enums used across the API, the views for inspecting pipeline state, the core CRUD and execution functions, and the configuration helper functions for pipeline steps and vector indexes. For model configuration helpers, see the Models reference.

Types

aidb.PipelineAutoProcessingMode

Controls how a pipeline automatically processes new or changed data.

CREATE TYPE PipelineAutoProcessingMode AS ENUM (
    'Live',
    'Background',
    'Disabled'
);
ValueDescription
LiveProcesses new data immediately as it arrives, using Postgres triggers.
BackgroundContinuously processes data in the background using Postgres workers.
DisabledNo automated processing. Use aidb.run_pipeline() to trigger manually.

aidb.PipelineDataFormat

Specifies the format of source data the pipeline processes.

CREATE TYPE PipelineDataFormat AS ENUM (
    'Text',
    'Image',
    'Pdf'
);
ValueDescription
TextPlain text data.
ImageImage data (bytes).
PdfPDF documents.

aidb.PipelineSourceType

Indicates the type of data source a pipeline reads from.

CREATE TYPE PipelineSourceType AS ENUM (
    'Table',
    'Volume',
    'Empty'
);
ValueDescription
TableA Postgres table or view.
VolumeA PGFS storage volume.
EmptyNo source; the pipeline generates its own data.

aidb.PipelineDestinationType

Indicates the type of destination a pipeline writes to.

CREATE TYPE PipelineDestinationType AS ENUM (
    'Table',
    'Volume',
    'Empty'
);
ValueDescription
TableA Postgres table.
VolumeA PGFS storage volume.
EmptyNo destination; output is discarded.

aidb.PipelineStepOperation

Defines the operation performed by a pipeline step.

CREATE TYPE PipelineStepOperation AS ENUM (
    'ChunkText',
    'SummarizeText',
    'ParseHtml',
    'ParsePdf',
    'PerformOcr',
    'KnowledgeBase',
    'PdfToImage',
    'SemanticKB'
);
ValueDescription
ChunkTextSplits text into smaller chunks.
SummarizeTextSummarizes text using a language model.
ParseHtmlExtracts text content from HTML.
ParsePdfExtracts text or images from PDFs.
PerformOcrRuns optical character recognition on images.
KnowledgeBaseComputes and stores embeddings in a knowledge base.
PdfToImageConverts PDF pages to images.
SemanticKBIndexes schema metadata into a semantic knowledge base.

aidb.PipelineStatus

Represents the current processing state of a pipeline.

CREATE TYPE PipelineStatus AS ENUM (
Stale         
Processing    
UpToDate      
NoResults     
Failed        
Unknown       
PartialErrors 
BlockingErrors
);
ValueDescription
StaleSource data has changed and the pipeline needs to run.
ProcessingThe pipeline is currently executing.
UpToDateAll source data has been processed successfully.
NoResultsProcessing completed but produced no output.
FailedThe last execution failed.
PartialErrorsSome records failed to process; others succeeded.
BlockingErrorsThe pipeline fails to run.
UnknownStatus cannot be determined.

aidb.DistanceOperator

Specifies the distance metric used for vector similarity search.

CREATE TYPE DistanceOperator AS ENUM (
    'L2',
    'InnerProduct',
    'Cosine',
    'L1',
    'Hamming',
    'Jaccard'
);
ValueDescription
L2Euclidean distance.
InnerProductInner product.
CosineCosine similarity.
L1L1 (Manhattan) distance.
HammingHamming distance.
JaccardJaccard distance.

Domains

aidb.pipeline_name_50

A TEXT domain enforcing that pipeline names are no longer than 50 characters.

aidb.background_sync_interval

An INTERVAL domain enforcing that background sync intervals are between 1 second and 2 days (inclusive).


Views

aidb.pipelines

Also accessible as aidb.pipes. Lists all registered pipelines and their configuration, including source, destination, processing mode, and step definitions.

ColumnTypeDescription
idintegerInternal pipeline identifier.
nametextName of the pipeline.
source_typeaidb.PipelineSourceTypeWhether the source is a table or volume.
source_schematextSchema of the source table.
sourcetextName of the source table or volume.
source_key_columntextColumn used as the unique key in the source.
source_data_columntextColumn containing the data to process.
destination_typeaidb.PipelineDestinationTypeWhether the destination is a table or volume.
destination_schematextSchema of the destination table.
destinationtextName of the destination table or volume.
destination_key_columntextKey column in the destination table.
destination_data_columntextColumn in the destination where processed data is written.
stepsjsonbOrdered array of pipeline step definitions.
auto_processingaidb.PipelineAutoProcessingModeAuto-processing mode.
batch_sizeintegerNumber of records processed per batch.
background_sync_intervalintervalInterval between executions in background mode.
owner_roletextPostgres role that owns this pipeline.

Example

SELECT name, source, destination, auto_processing FROM aidb.pipelines;

aidb.pipeline_metrics

Also accessible as aidb.pipem. Shows current processing statistics for each pipeline.

ColumnTypeDescription
pipelinetextName of the pipeline.
auto processingtextCurrent auto-processing mode.
table: unprocessed rowsbigintFor table sources: number of rows not yet processed.
volume: scans completedbigintFor volume sources: number of full scans completed.
count(source records)bigintTotal number of records in the source.
count(destination records)bigintTotal number of records in the destination.
StatustextCurrent pipeline status.
count(record errors)bigintTotal number of records the failed processing.
count(blocking errors)bigintTotal number of errors that prevent the pipeline from running.

Example

SELECT * FROM aidb.pipeline_metrics;
Output
       pipeline       | auto processing | table: unprocessed rows | volume: scans completed | count(source records) | count(destination records) |  Status  | count(record errors) | count(blocking errors)
----------------------+-----------------+-------------------------+-------------------------+-----------------------+----------------------------+----------+----------------------+------------------------
 pipeline__7471a      | Background      |                       0 |                         |                     5 |                          5 | UpToDate |                    0 |                      0
 pipeline__7471b      | Background      |                       0 |                         |                     5 |                          5 | UpToDate |                    0 |                      0
 animal_facts_kb      | Disabled        |                       0 |                         |                    99 |                        372 | UpToDate |                    0 |                      0
 animal_facts_kb_bert | Disabled        |                       0 |                         |                    99 |                        372 | UpToDate |                    0 |                      0
 mpkb_pipe_int        | Disabled        |                       0 |                         |                     2 |                          4 | UpToDate |                    0 |                      0
 mpkb_pipe_text       | Disabled        |                       0 |                         |                     2 |                          4 | UpToDate |                    0 |                      0
(6 rows)

Functions

aidb.create_pipeline

Creates a new pipeline with a source, up to 10 sequential processing steps, and an optional destination.

Parameters

ParameterTypeDefaultDescription
nameTEXTRequiredName of the pipeline. Max 50 characters.
sourceTEXTRequiredName of the source table or volume.
step_1aidb.PipelineStepOperationRequiredOperation for the first pipeline step.
source_key_columnTEXTNULLUnique key column in the source table.
source_data_columnTEXTNULLColumn containing the data to process.
destinationTEXTNULLName of the destination table or volume.
auto_processingaidb.PipelineAutoProcessingModeNULLAuto-processing mode.
batch_sizeINTNULLNumber of records to process per batch.
background_sync_intervalINTERVALNULLInterval between background executions. Must be between 1 second and 2 days.
owner_roleTEXTNULLRole to own and execute this pipeline.
step_1_optionsJSONBNULLConfiguration for step 1 (use the appropriate step config helper).
step_2step_10aidb.PipelineStepOperationNULLOperation for steps 2–10.
step_2_optionsstep_10_optionsJSONBNULLConfiguration for steps 2–10.

Returns

ColumnTypeDescription
nametextName of the created pipeline.
destination_typetextType of the pipeline destination.
destination_schematextSchema of the destination.
destinationtextName of the destination.
destination_key_columntextKey column in the destination.
destination_data_columntextData column in the destination.

Example

-- Single-step pipeline: chunk text from a table into a destination table
SELECT aidb.create_pipeline(
    name                => 'my_chunker',
    source              => 'source_docs',
    source_key_column   => 'id',
    source_data_column  => 'body',
    destination         => 'chunked_docs',
    step_1              => 'ChunkText',
    step_1_options      => aidb.chunk_text_config(200, 250, 25),
    auto_processing     => 'Live'
);

-- Multi-step pipeline: parse PDF, then embed into a knowledge base
SELECT aidb.create_pipeline(
    name                => 'pdf_to_kb',
    source              => 'pdf_volume',
    destination         => 'my_kb',
    step_1              => 'ParsePdf',
    step_1_options      => aidb.pdf_parse_config(),
    step_2              => 'KnowledgeBase',
    step_2_options      => aidb.knowledge_base_config('my_model', 'Text'),
    auto_processing     => 'Background',
    background_sync_interval => '60 seconds'
);

aidb.update_pipeline

Updates the auto-processing settings for an existing pipeline.

Parameters

ParameterTypeDefaultDescription
nameTEXTRequiredName of the pipeline to update.
auto_processingaidb.PipelineAutoProcessingModeNULLNew auto-processing mode.
batch_sizeINTNULLNew batch size.
background_sync_intervalINTERVALNULLNew background sync interval.

Example

SELECT aidb.update_pipeline('my_chunker', auto_processing => 'Background', background_sync_interval => '5 minutes');

aidb.delete_pipeline

Deletes a pipeline and its configuration. Doesn't delete the source or destination tables.

Parameters

ParameterTypeDefaultDescription
nameTEXTRequiredName of the pipeline to delete.

Example

SELECT aidb.delete_pipeline('my_chunker');

aidb.run_pipeline

Manually triggers a pipeline to execute immediately, regardless of its auto_processing mode.

Parameters

ParameterTypeDefaultDescription
pipeline_nameTEXTRequiredName of the pipeline to run.

Example

SELECT aidb.run_pipeline('my_chunker');

Pipeline step config helpers

These functions return a JSONB configuration object for use in step_N_options parameters of aidb.create_pipeline.

aidb.chunk_text_config

Configures a ChunkText step to split text into smaller segments.

Parameters

ParameterTypeDefaultDescription
desired_lengthINTEGERRequiredTarget chunk size.
max_lengthINTEGERNULLMaximum allowed chunk size.
overlap_lengthINTEGERNULLNumber of units to overlap between consecutive chunks.
strategyTEXTNULLChunking unit: 'chars' (default) or 'words'.

Example

-- Chunk into ~200 character segments, max 250, with 25-character overlap
SELECT aidb.chunk_text_config(200, 250, 25, 'chars');

aidb.summarize_text_config

Configures a SummarizeText step to summarize text using a language model.

Parameters

ParameterTypeDefaultDescription
modelTEXTRequiredName of the model to use for summarization.
chunk_configJSONBNULLOptional chunking config (from aidb.chunk_text_config) applied before summarizing.
promptTEXTNULLCustom prompt to guide the summarization.
strategyTEXTNULL'append' (default) or 'reduce'.
reduction_factorINTEGERNULLWith 'reduce' strategy: aggressiveness of each reduction pass (default: 3).
inference_configJSONBNULLOptional inference settings (from aidb.inference_config).

Example

SELECT aidb.summarize_text_config(
    'my_llm',
    chunk_config => aidb.chunk_text_config(100, 100, 10, 'words'),
    prompt       => 'Summarize concisely',
    strategy     => 'reduce',
    reduction_factor => 4
);

aidb.ocr_config

Configures a PerformOcr step to extract text from images using a model.

Parameters

ParameterTypeDefaultDescription
modelTEXTRequiredName of the OCR model to use.

Example

SELECT aidb.ocr_config('my_ocr_model');

aidb.html_parse_config

Configures a ParseHtml step to extract text from HTML content.

Parameters

ParameterTypeDefaultDescription
methodTEXTNULLParsing method to use. If NULL, uses the default method.

Example

SELECT aidb.html_parse_config();

aidb.pdf_parse_config

Configures a ParsePdf step to extract content from PDF documents.

Parameters

ParameterTypeDefaultDescription
methodTEXTNULLParsing method to use. If NULL, uses the default method.
allow_partial_parsingBOOLEANNULLWhen true, returns partial results if some pages cannot be parsed.

Example

SELECT aidb.pdf_parse_config(allow_partial_parsing => true);

aidb.knowledge_base_config

Configures a KnowledgeBase step to compute and store embeddings.

Parameters

ParameterTypeDefaultDescription
modelTEXTRequiredName of the embedding model to use.
data_formataidb.PipelineDataFormatRequiredFormat of the data being embedded.
distance_operatoraidb.DistanceOperatorNULLDistance function for similarity search. Defaults to L2.
vector_indexJSONBNULLVector index configuration (from a vector index config helper).

Example

SELECT aidb.knowledge_base_config(
    'my_embedding_model',
    'Text',
    distance_operator => 'Cosine',
    vector_index      => aidb.vector_index_hnsw_config(m => 16, ef_construction => 64)
);

aidb.knowledge_base_config_from_kb

Configures a KnowledgeBase step to attach a pipeline to an existing knowledge base rather than creating a new one.

Parameters

ParameterTypeDefaultDescription
data_formataidb.PipelineDataFormatRequiredFormat of the data being embedded.

Example

SELECT aidb.knowledge_base_config_from_kb('Text');

aidb.inference_config

Builds an inference configuration object for use with language model steps such as SummarizeText. All parameters are optional.

Parameters

ParameterTypeDefaultDescription
system_promptTEXTNULLSystem prompt prepended to each request.
temperatureDOUBLE PRECISIONNULLSampling temperature (higher = more random).
max_tokensINTEGERNULLMaximum number of tokens to generate.
top_pDOUBLE PRECISIONNULLNucleus sampling threshold.
seedBIGINTNULLRandom seed for reproducible outputs.
repeat_penaltyREALNULLPenalty for repeated tokens.
repeat_last_nINTEGERNULLNumber of recent tokens to apply repeat penalty over.
thinkingBOOLEANNULLEnable extended reasoning (supported models only).
extra_argsJSONBNULLAdditional provider-specific inference arguments.

Example

SELECT aidb.inference_config(
    system_prompt => 'You are a technical summarizer.',
    temperature   => 0.3,
    max_tokens    => 512
);

Vector index config helpers

These functions return a JSONB configuration for the vector_index parameter of aidb.knowledge_base_config.

aidb.vector_index_hnsw_config

Configures an HNSW index (pgvector).

Parameters

ParameterTypeDefaultDescription
vector_data_typeTEXTNULLVector storage type.
mINTEGERNULLMaximum number of connections per node (default: 16).
ef_constructionINTEGERNULLBuild-time search depth (default: 64).
ef_searchINTEGERNULLQuery-time search depth.
Note
HNSW supports a maximum of 2000 dimensions. For higher-dimensional vectors, use `aidb.vector_index_disabled_config()`.

The following table shows how each distance_operator value maps to a pgvector ops class:

distance_operatorIndex ops class
L2vector_l2_ops
InnerProductvector_ip_ops
Cosinevector_cosine_ops
L1vector_l1_ops

Example

SELECT aidb.vector_index_hnsw_config(m => 16, ef_construction => 64);

aidb.vector_index_ivfflat_config

Configures a pgvector IVFFlat index.

Parameters

ParameterTypeDefaultDescription
vector_data_typeTEXTNULLVector storage type.
listsINTEGERNULLNumber of clusters (inverted lists).
probesINTEGERNULLNumber of clusters to search at query time.

Example

SELECT aidb.vector_index_ivfflat_config(lists => 100);

aidb.vector_index_chord_hnsw_config

Configures a VectorChord HNSW index.

Parameters

ParameterTypeDefaultDescription
vector_data_typeTEXTNULLVector storage type.
mINTEGERNULLMaximum connections per node.
ef_constructionINTEGERNULLBuild-time search depth.
max_connectionsINTEGERNULLMaximum connections in the graph.
mlDOUBLE PRECISIONNULLLevel multiplier controlling graph layer structure.

Example

SELECT aidb.vector_index_chord_hnsw_config(m => 16, ef_construction => 64);

aidb.vector_index_chord_vchordq_config

Configures a VectorChord Vchordq index.

Parameters

ParameterTypeDefaultDescription
vector_data_typeTEXTNULLVector storage type.
listsTEXTNULLNumber of clusters.
spherical_centroidsBOOLEANNULLUse spherical (normalized) centroids when clustering.

Example

SELECT aidb.vector_index_chord_vchordq_config(lists => '100', spherical_centroids => true);

aidb.vector_index_hsphere_optimized_config

Configures an HSphere Optimized index.

Parameters

ParameterTypeDefaultDescription
clustersINTEGERRequiredNumber of clusters.
precision_valDOUBLE PRECISIONRequiredIndexing precision value.
vector_data_typeTEXTNULLVector storage type.

Example

SELECT aidb.vector_index_hsphere_optimized_config(clusters => 256, precision_val => 0.95);

aidb.vector_index_disabled_config

Disables automatic vector index creation. Use this when your embedding dimensions exceed 2000 or when you want to manage indexes manually.

Example

SELECT aidb.vector_index_disabled_config();

Model config helpers

Model config helpers have moved to the Models reference page.