AI pipelines v7
A pipeline is the core building block of AIDB. It defines how raw data — from a Postgres table or an external volume — flows through a sequence of transformation steps and lands in an AI-ready destination.
Source → Step 1 → Step 2 → ... → Knowledge Base
(parse/chunk) (embed) (indexed + queryable)Each step handles one transformation: parsing a PDF, chunking text, running OCR, summarizing content, or generating vector embeddings. The output of one step becomes the input for the next. At the end of the pipeline, your data is embedded, indexed, and ready to query with semantic or hybrid search.
Pipelines can run on demand, in batch, or automatically whenever source data changes — keeping your knowledge base in sync without manual intervention.
| Page | What it covers |
|---|---|
| Overview | Core concepts: sources, steps, destinations, and how pipelines relate to knowledge bases. |
| Creating pipelines | Defining a pipeline with aidb.create_pipeline() — source, steps, auto-processing, and volume sources. |
| Pipeline steps | Available step types: ChunkText, ParseHtml, ParsePdf, PerformOcr, SummarizeText, KnowledgeBase. |
| Orchestration | Auto-processing modes, background workers, observability, and error handling. |
| Reference | Full API reference for pipeline types, views, CRUD functions, and config helpers. |
| Example | End-to-end worked example. |