Orchestration v7
Orchestration manages the execution of the pipeline, determining when and how data is processed. Here are the core components of orchestration:
Auto-processing: Pipelines can run in the background or stay in sync with your data. You can configure batch sizes, triggers, and choose between synchronous or asynchronous processing modes.
Change detection: It identifies new, updated, or deleted records so only dirty data is processed.
Sync vs async: You choose between live (immediate, transactional updates) or background (periodic, non-blocking updates).
Background workers: Background workers are dedicated Postgres processes that execute AI tasks — such as OCR, document parsing, and embedding API calls — asynchronously, without affecting the performance of your main database.
Resource isolation: They run independently of user sessions, ensuring your app stays fast even during massive data ingestions.
Batching & retries: They group data into efficient batches to save on API costs and automatically retry if an external service (like OpenAI) is temporarily down.
Observability: It provides detailed status of your automated workflows. Since these processes are running in the background, observability allows you to track health, speed, and accuracy.
Status tracking: Use views like
aidb.pipeline_metricsto see how many rows are in the backlog and the total count of records in the destination.Audit & error logs: It captures exactly why a specific file failed to parse, for example, a corrupted PDF or a timeout, allowing for quick troubleshooting.
To see orchestration applied in a complete pipeline workflow, see Example.