Hybrid search combines semantic (vector/embedding) search on knowledge base embeddings with one or more additional filter mechanisms:
Relational predicate filtering — Filter results by any column in the source table, for example
WHERE category = 'vegetarian'orWHERE published_date > '2024-01-01'.BM25 full-text search — Rank results by keyword relevance using a BM25 index, ensuring that specific words or phrases appear in the results.
These two approaches can be used independently or together alongside semantic search.
AI pipelines handle knowledge base construction, including embedding generation, as normal. Hybrid search is applied at query time by constructing custom SQL that combines the vector table with your source table. It is not a separately installed feature and no additional setup is required beyond the prerequisites below.
Prerequisites
pgvector iterative index scans are required for all hybrid query patterns. BM25 and native FTS are only needed if your queries use those specific keyword legs.
pgvector iterative index scans
Requires pgvector 0.8.0 or later. Enable iterative index scans before running hybrid queries — they allow the vector index to keep producing candidates until your LIMIT is satisfied, even after filtering. See Write hybrid search queries for the SET commands.
BM25 (optional)
BM25 full-text filtering requires the vchord_bm25 and pg_tokenizer extensions. These are user-installed and managed. See Setting up user-managed BM25 for setup details.
Native PostgreSQL FTS (no extension required)
Standard PostgreSQL full-text search (FTS) (tsvector / ts_rank) is available in every Postgres installation without additional extensions. See Setting up native PostgreSQL FTS for setup steps and a comparison with BM25.
Volume source limitation
Hybrid search with relational predicate filtering or BM25 is not supported for volume-sourced knowledge bases. Volume sources have no backing Postgres table to join against, so there's no source table to filter or index.
Volume sourced knowledge bases continue to support standard semantic search via aidb.retrieve_text() and aidb.retrieve_key().
Helper functions
AIDB provides two functions for constructing hybrid queries:
aidb.kb_query_encode(kb_name, query_text)— Encodes a text query using the embedding model configured for a specific knowledge base. Use this instead ofaidb.encode_text_query()when writing custom vector queries — you don't need to look up the model name separately.aidb.rerank_text()— Optionally reranks retrieved candidates using a reranking model as a post-processing step after retrieval.
The aidb.knowledge_bases view (also accessible as aidb.kbs) exposes knowledge base metadata — including the embeddings table name, distance operator, and attached pipelines — which you can query to construct dynamic hybrid queries. See Reference for full function signatures and the view schema.
Next steps
- Setting up user-managed BM25 — Install extensions, create a tokenizer, and add a BM25 column and index to your source table.
- Setting up native PostgreSQL FTS — Add a
tsvectorcolumn, GIN index, and trigger to your source table. - Query patterns — SQL patterns for every combination: relational filter, BM25 filter, Reciprocal Rank Fusion (RRF), weighted linear fusion. Includes a worked example with sample data and expected output.
- Fallback and degradation — How to handle embedding service outages gracefully.
- Reference — Full reference for
aidb.kb_query_encode(),aidb.rerank_text(), and theaidb.knowledge_basesview.