AI pipelines v7

A pipeline is the core building block of AIDB. It defines how raw data — from a Postgres table or an external volume — flows through a sequence of transformation steps and lands in an AI-ready destination.

Source  →  Step 1       →  Step 2  →  ...  →  Knowledge Base
           (parse/chunk)   (embed)             (indexed + queryable)

Each step handles one transformation: parsing a PDF, chunking text, running OCR, summarizing content, or generating vector embeddings. The output of one step becomes the input for the next. At the end of the pipeline, your data is embedded, indexed, and ready to query with semantic or hybrid search.

Pipelines can run on demand, in batch, or automatically whenever source data changes — keeping your knowledge base in sync without manual intervention.

PageWhat it covers
OverviewCore concepts: sources, steps, destinations, and how pipelines relate to knowledge bases.
Creating pipelinesDefining a pipeline with aidb.create_pipeline() — source, steps, auto-processing, and volume sources.
Pipeline stepsAvailable step types: ChunkText, ParseHtml, ParsePdf, PerformOcr, SummarizeText, KnowledgeBase.
OrchestrationAuto-processing modes, background workers, observability, and error handling.
ReferenceFull API reference for pipeline types, views, CRUD functions, and config helpers.
ExampleEnd-to-end worked example.