Knowledge bases reference v7
Reference for knowledge base functions and views. For guide-style documentation, see Knowledge bases.
Views
aidb.knowledge_bases
Also accessible as aidb.kbs. Lists all knowledge bases in the database with their full configuration.
| Column | Type | Description |
|---|---|---|
id | integer | Internal identifier |
name | text | Knowledge base name |
vector_schema | text | Schema of the embeddings table |
vector_table | text | Embeddings storage table |
model_name | text | Embedding model used |
distance_operator | aidb.DistanceOperator | Vector distance function used for retrieval |
distance_operator_sql | text | The distance operator in SQL-operator syntax (e.g. <->, <=>) |
vector_data_column | text | Embeddings column |
vector_key_column | text | Key column in the embeddings table |
vector_index | jsonb | Vector index configuration |
pipeline_ids | integer[] | IDs of all pipelines attached to this knowledge base |
pipeline_names | text[] | Names of all pipelines attached to this knowledge base |
aidb.knowledge_base_stats
Also accessible as aidb.kbstat. Provides current processing statistics for all knowledge base pipelines.
\d aidb.knowledge_base_stats View "aidb.knowledge_base_stats" Column | Type | Collation | Nullable | Default ------------+---------+-----------+----------+--------- name | text | | | pipelines | integer | | | <- number of pipelines attached embeddings | bigint | | | <-- this is the old "count(embeddings)" status | text | | | <-- the same status field we have in "aidb.pipeline_metrics" but we collect the "worst" status of all the connected pipelines
Example
SELECT * from aidb.kbm;
name | pipelines | embeddings | status --------------------------------------+-----------+------------+---------- public.pipeline_pipeline__7471a | 1 | 5 | UpToDate public.pipeline_pipeline__7471b | 1 | 5 | UpToDate public.pipeline_animal_facts_kb | 1 | 372 | UpToDate public.pipeline_animal_facts_kb_bert | 1 | 372 | UpToDate public.mpkb_shared_vectors | 2 | 4 | UpToDate (5 rows)
Types
aidb.DistanceOperator
Vector distance function used during retrieval.
| Value | Description |
|---|---|
L2 | Euclidean distance |
InnerProduct | Inner product |
Cosine | Cosine similarity |
L1 | L1 distance |
Hamming | Hamming distance |
Jaccard | Jaccard distance |
aidb.PipelineDataFormat
Format of data in a pipeline source or volume.
| Value | Description |
|---|---|
Text | Plain text |
Image | Binary image data |
Pdf | PDF documents |
aidb.PipelineAutoProcessingMode
Auto-processing mode for a knowledge base pipeline.
| Value | Description |
|---|---|
Live | Immediate processing via Postgres triggers on each data change |
Background | Periodic processing via a Postgres background worker |
Disabled | No automatic processing; run manually with aidb.bulk_embedding() |
Knowledge base functions
aidb.set_auto_knowledge_base
Updates the auto-processing mode for an existing knowledge base.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
knowledge_base_name | TEXT | Required | Name of the knowledge base. |
mode | aidb.PipelineAutoProcessingMode | Required | New auto-processing mode. |
batch_size | INTEGER | NULL | Records processed per batch (Background and Disabled modes). |
background_sync_interval | INTERVAL | NULL | Polling interval (Background mode). |
Examples
SELECT aidb.set_auto_knowledge_base('my_kb', 'Live'); SELECT aidb.set_auto_knowledge_base('my_kb', 'Background', background_sync_interval => '1 minute'); SELECT aidb.set_auto_knowledge_base('my_kb', 'Disabled', batch_size => 200);
aidb.retrieve_key
Returns the source record keys and distances for the top matching embeddings, without fetching source data.
In part_ids, the call also returns, for each step in a multi-step pipeline, the IDs of the parts that match the query. For example if a single source record is split into 10 parts by a "ChunkText" step, then part_ids will indicate which chunk matched.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
knowledge_base_name | TEXT | Required | Name of the knowledge base. |
query | TEXT | Required | Query text to search with. |
topk | INTEGER | 1 | Number of results to return. |
deduplicate | BOOLEAN | true | Return each source result only once even if multiple "parts" (e.g. chunks/pages) match |
Returns
| Column | Type | Description |
|---|---|---|
key | text | Source record key |
distance | double precision | Vector distance from the query |
part_ids | bigint[] | Which parts i.e. pipeline step results match (chunk or page) |
pipeline_name | text | The name of the pipeline processing this source record |
Example
SELECT * FROM aidb.retrieve_key('public.pipeline_animal_facts_kb', 'birds', topk=>3);
key | distance | part_ids | pipeline_name
-----+--------------------+----------+-----------------
5 | 1.1931772046185758 | {0,0} | animal_facts_kb
93 | 1.1980633963685476 | {2,0} | animal_facts_kb
98 | 1.2150919866080878 | {2,0} | animal_facts_kb
(3 rows)aidb.retrieve_text
Returns source text and distances for the top matching embeddings by joining the embeddings table with the source table.
In part_ids, the call also returns, for each step in a multi-step pipeline, the IDs of the parts that match the query. For example if a single source record is split into 10 parts by a "ChunkText" step, then part_ids will indicate which chunk matched.
The return column intermediate_steps will contain the actual result text of the intermediate step results.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
knowledge_base_name | TEXT | Required | Name of the knowledge base. |
query_string | TEXT | Required | Query text to search with. |
number_of_results | INTEGER | 0 | Number of results to return. Uses topk default if 0. |
knowledge_base_name | TEXT | Required | Name of the knowledge base. |
query | TEXT | Required | Query text to search with. |
topk | INTEGER | 1 | Number of results to return. |
deduplicate | BOOLEAN | true | Return each source result only once even if multiple "parts" (e.g. chunks/pages) match |
Returns
| Column | Type | Description |
|---|---|---|
key | text | Source record key |
value | text | Source text content |
distance | double precision | Vector distance from the query |
part_ids | bigint[] | Which parts i.e. pipeline step results match (chunk or page) |
pipeline_name | text | The name of the pipeline processing this source record |
intermediate_steps | jsonb | The matching results from the intermediate steps; i.e. the values belonging to "part_ids" |
Example
SELECT * FROM aidb.retrieve_text('my_kb', 'waterproof jacket', 3);
key | value | distance -------+----------------------------------------------------+-------------------- 19337 | Men Stripes Waterproof Shell Jacket | 0.2994317672742334 55018 | Women All-Weather Hiking Anorak | 0.3804609668507203 (2 rows)
aidb.delete_knowledge_base
Removes a knowledge base; i.e. its destination vector table and its configuration. Also removes all attached Pipelines. Does not delete the underlying source table.
Parameters
| Parameter | Type | Description |
|---|---|---|
knowledge_base_name | TEXT | Name of the knowledge base to delete. |
Example
Volume functions
aidb.create_volume
Creates an AIDB volume from a PGFS storage location for use as a knowledge base data source. See External data sources for setting up the PGFS storage location first.
Parameters
| Parameter | Type | Description |
|---|---|---|
name | TEXT | Name for the volume. |
server_name | TEXT | Name of the PGFS storage location. |
path | TEXT | Sub-path within the storage location. |
mime_type | TEXT | Data type: Text, Image, or Pdf. |
Example
SELECT aidb.create_volume('pdf_volume', 'my_s3_location', '/', 'Pdf');
aidb.list_volumes
Lists all AIDB volumes in the database.
SELECT * FROM aidb.list_volumes();
aidb.delete_volume
Deletes an AIDB volume. Note: deleting the underlying PGFS storage location also deletes all volumes built on top of it.
Parameters
| Parameter | Type | Description |
|---|---|---|
volume_name | TEXT | Name of the volume to delete. |
SELECT aidb.delete_volume('pdf_volume');