Examples Innovation Release
- Hybrid Manager dual release strategy
- Documentation for the current Long-term support release
The following examples how to perform bulk knowledge base (KB) embedding using AI pipelines, including the expected outputs for each stage. These examples cover ingesting data from a standard Postgres table and an external S3 bucket volume.
Example 1: Bulk embedding from a Postgres table
This workflow is used when your source data is already stored within your database.
Step 1: Initialize the AI model
First, register a model that supports text encoding operations:
SELECT aidb.create_model('model__6545', 'bert_local');
create_model -------------- model__6545 (1 row)
Step 2: Prepare the source data
Create a table and insert the records you wish to embed:
CREATE TABLE source_table__6545 ( id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, content TEXT NOT NULL ); INSERT INTO source_table__6545 (id, content) VALUES (43941, 'Catwalk Women Brown Heels'), (55018, 'Lakme 3 in 1 Orchid Aqua Shine Lip Color'), (19337, 'United Colors of Benetton Men Stripes Black Jacket');
Step 3: Define the pipeline
Create a pipeline that connects the source table to a knowledge base destination:
SELECT aidb.create_pipeline( name => 'pipeline__6545', source => 'source_table__6545', source_key_column => 'id', source_data_column => 'content', step_1 => 'KnowledgeBase', step_1_options => aidb.knowledge_base_config( model => 'dummy', data_format => 'Text' ) );
INFO: Created pipeline "pipeline__6545"
INFO: Created knowledge base "public.pipeline_pipeline__6545" for pipeline "pipeline__6545"
create_pipeline
-----------------------------------------------------------------------
(pipeline__6545,Table,public,pipeline_pipeline__6545,source_id,value)
(1 row)Step 4: Execute and verify
Trigger the processing engine and inspect the generated vector table.
SELECT aidb.run_pipeline('pipeline__6545'); -- View the generated embeddings SELECT * FROM pipeline_pipeline__6545 ORDER BY id;
id | pipeline_id | source_id | part_ids | value
----+-------------+-----------+----------+---------------
1 | 49 | 43941 | {0} | [0.1,0.2,0.3]
2 | 49 | 55018 | {0} | [0.1,0.2,0.3]
3 | 49 | 19337 | {0} | [0.1,0.2,0.3]
(3 rows)Example 2: Bulk embedding from an S3 bucket (PGFS)
This workflow is used for unstructured data stored in external cloud storage.
Step 1: Configure external storage and volume
Define the S3 bucket location and map it to an AIDB volume.
SELECT pgfs.create_storage_location( 'bucket__6545', 's3://noah-html-test', options => '{"region": "eu-central-1", "skip_signature": "true"}' ); SELECT aidb.create_volume('volume__6545', 'bucket__6545', '/', 'Text');
Step 2: List volume content
Verify the files available for processing:
SELECT * FROM aidb.list_volume_content('volume__6545');
file_name | size | last_modified ------------------+-------+------------------------- Bilge Ince.html | 8878 | 2025-02-19 09:58:29 UTC all_tags.html | 11170 | 2025-02-19 09:58:29 UTC hello_world.html | 108 | 2025-02-19 09:58:30 UTC simple.html | 662 | 2025-02-19 09:58:29 UTC (4 rows)
Step 3: Create and run the volume pipeline
SELECT aidb.create_pipeline( name => 'volume_pipeline__6545', source => 'volume__6545', step_1 => 'KnowledgeBase', step_1_options => aidb.knowledge_base_config( model => 'dummy', data_format => 'Text' ) ); SELECT aidb.run_pipeline('volume_pipeline__6545');
Step 4: Verify embeddings
SELECT source_id, part_ids, value FROM pipeline_volume_pipeline__6545 ORDER BY source_id;
source_id | part_ids | value
------------------+----------+---------------
Bilge Ince.html | {0} | [0.1,0.2,0.3]
all_tags.html | {0} | [0.1,0.2,0.3]
hello_world.html | {0} | [0.1,0.2,0.3]
simple.html | {0} | [0.1,0.2,0.3]
(4 rows)