Knowledge bases reference v7

Reference for knowledge base functions and views. For guide-style documentation, see Knowledge bases.

Views

aidb.knowledge_bases

Also accessible as aidb.kbs. Lists all knowledge bases in the database with their full configuration.

ColumnTypeDescription
idintegerInternal identifier
nametextKnowledge base name
vector_schematextSchema of the embeddings table
vector_tabletextEmbeddings storage table
model_nametextEmbedding model used
distance_operatoraidb.DistanceOperatorVector distance function used for retrieval
distance_operator_sqltextThe distance operator in SQL-operator syntax (e.g. <->, <=>)
vector_data_columntextEmbeddings column
vector_key_columntextKey column in the embeddings table
vector_indexjsonbVector index configuration
pipeline_idsinteger[]IDs of all pipelines attached to this knowledge base
pipeline_namestext[]Names of all pipelines attached to this knowledge base

aidb.knowledge_base_stats

Also accessible as aidb.kbstat. Provides current processing statistics for all knowledge base pipelines.

\d aidb.knowledge_base_stats View "aidb.knowledge_base_stats" Column | Type | Collation | Nullable | Default ------------+---------+-----------+----------+--------- name | text | | | pipelines | integer | | | <- number of pipelines attached embeddings | bigint | | | <-- this is the old "count(embeddings)" status | text | | | <-- the same status field we have in "aidb.pipeline_metrics" but we collect the "worst" status of all the connected pipelines

Example

SELECT * from aidb.kbm;
Output
                 name                 | pipelines | embeddings |  status
--------------------------------------+-----------+------------+----------
 public.pipeline_pipeline__7471a      |         1 |          5 | UpToDate
 public.pipeline_pipeline__7471b      |         1 |          5 | UpToDate
 public.pipeline_animal_facts_kb      |         1 |        372 | UpToDate
 public.pipeline_animal_facts_kb_bert |         1 |        372 | UpToDate
 public.mpkb_shared_vectors           |         2 |          4 | UpToDate
(5 rows)

Types

aidb.DistanceOperator

Vector distance function used during retrieval.

ValueDescription
L2Euclidean distance
InnerProductInner product
CosineCosine similarity
L1L1 distance
HammingHamming distance
JaccardJaccard distance

aidb.PipelineDataFormat

Format of data in a pipeline source or volume.

ValueDescription
TextPlain text
ImageBinary image data
PdfPDF documents

aidb.PipelineAutoProcessingMode

Auto-processing mode for a knowledge base pipeline.

ValueDescription
LiveImmediate processing via Postgres triggers on each data change
BackgroundPeriodic processing via a Postgres background worker
DisabledNo automatic processing; run manually with aidb.bulk_embedding()

Knowledge base functions

aidb.set_auto_knowledge_base

Updates the auto-processing mode for an existing knowledge base.

Parameters

ParameterTypeDefaultDescription
knowledge_base_nameTEXTRequiredName of the knowledge base.
modeaidb.PipelineAutoProcessingModeRequiredNew auto-processing mode.
batch_sizeINTEGERNULLRecords processed per batch (Background and Disabled modes).
background_sync_intervalINTERVALNULLPolling interval (Background mode).

Examples

SELECT aidb.set_auto_knowledge_base('my_kb', 'Live');
SELECT aidb.set_auto_knowledge_base('my_kb', 'Background', background_sync_interval => '1 minute');
SELECT aidb.set_auto_knowledge_base('my_kb', 'Disabled', batch_size => 200);

aidb.retrieve_key

Returns the source record keys and distances for the top matching embeddings, without fetching source data. In part_ids, the call also returns, for each step in a multi-step pipeline, the IDs of the parts that match the query. For example if a single source record is split into 10 parts by a "ChunkText" step, then part_ids will indicate which chunk matched.

Parameters

ParameterTypeDefaultDescription
knowledge_base_nameTEXTRequiredName of the knowledge base.
queryTEXTRequiredQuery text to search with.
topkINTEGER1Number of results to return.
deduplicateBOOLEANtrueReturn each source result only once even if multiple "parts" (e.g. chunks/pages) match

Returns

ColumnTypeDescription
keytextSource record key
distancedouble precisionVector distance from the query
part_idsbigint[]Which parts i.e. pipeline step results match (chunk or page)
pipeline_nametextThe name of the pipeline processing this source record

Example

SELECT * FROM aidb.retrieve_key('public.pipeline_animal_facts_kb', 'birds', topk=>3);
Output
 key |      distance      | part_ids |  pipeline_name
-----+--------------------+----------+-----------------
 5   | 1.1931772046185758 | {0,0}    | animal_facts_kb
 93  | 1.1980633963685476 | {2,0}    | animal_facts_kb
 98  | 1.2150919866080878 | {2,0}    | animal_facts_kb
(3 rows)

aidb.retrieve_text

Returns source text and distances for the top matching embeddings by joining the embeddings table with the source table. In part_ids, the call also returns, for each step in a multi-step pipeline, the IDs of the parts that match the query. For example if a single source record is split into 10 parts by a "ChunkText" step, then part_ids will indicate which chunk matched. The return column intermediate_steps will contain the actual result text of the intermediate step results.

Parameters

ParameterTypeDefaultDescription
knowledge_base_nameTEXTRequiredName of the knowledge base.
query_stringTEXTRequiredQuery text to search with.
number_of_resultsINTEGER0Number of results to return. Uses topk default if 0.
knowledge_base_nameTEXTRequiredName of the knowledge base.
queryTEXTRequiredQuery text to search with.
topkINTEGER1Number of results to return.
deduplicateBOOLEANtrueReturn each source result only once even if multiple "parts" (e.g. chunks/pages) match

Returns

ColumnTypeDescription
keytextSource record key
valuetextSource text content
distancedouble precisionVector distance from the query
part_idsbigint[]Which parts i.e. pipeline step results match (chunk or page)
pipeline_nametextThe name of the pipeline processing this source record
intermediate_stepsjsonbThe matching results from the intermediate steps; i.e. the values belonging to "part_ids"

Example

SELECT * FROM aidb.retrieve_text('my_kb', 'waterproof jacket', 3);
Output
  key  |                       value                        |      distance
-------+----------------------------------------------------+--------------------
 19337 | Men Stripes Waterproof Shell Jacket                | 0.2994317672742334
 55018 | Women All-Weather Hiking Anorak                    | 0.3804609668507203
(2 rows)

aidb.delete_knowledge_base

Removes a knowledge base; i.e. its destination vector table and its configuration. Also removes all attached Pipelines. Does not delete the underlying source table.

Parameters

ParameterTypeDescription
knowledge_base_nameTEXTName of the knowledge base to delete.

Example


Volume functions

aidb.create_volume

Creates an AIDB volume from a PGFS storage location for use as a knowledge base data source. See External data sources for setting up the PGFS storage location first.

Parameters

ParameterTypeDescription
nameTEXTName for the volume.
server_nameTEXTName of the PGFS storage location.
pathTEXTSub-path within the storage location.
mime_typeTEXTData type: Text, Image, or Pdf.

Example

SELECT aidb.create_volume('pdf_volume', 'my_s3_location', '/', 'Pdf');

aidb.list_volumes

Lists all AIDB volumes in the database.

SELECT * FROM aidb.list_volumes();

aidb.delete_volume

Deletes an AIDB volume. Note: deleting the underlying PGFS storage location also deletes all volumes built on top of it.

Parameters

ParameterTypeDescription
volume_nameTEXTName of the volume to delete.
SELECT aidb.delete_volume('pdf_volume');