EDB Docs - EDB Postgres AI Database v7

The following examples illustrate how to perform bulk knowledge base (KB) embedding using AI pipelines, including the expected outputs for each stage. These examples cover ingesting data from a standard Postgres table and an external S3 bucket volume.

Example 1: Bulk embedding from a Postgres table

This workflow is used when your source data is already stored within your database.

Step 1: Initialize the AI model

SELECT aidb.create_model('model__6545', 'bert_local');

Output

 create_model 
--------------
 model__6545
(1 row)

Step 2: Prepare the source data

Create a table and insert the records you wish to embed:

CREATE TABLE source_table__6545 (
    id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    content TEXT NOT NULL
);

INSERT INTO source_table__6545 (id, content)
VALUES (43941, 'Catwalk Women Brown Heels'),
       (55018, 'Lakme 3 in 1 Orchid Aqua Shine Lip Color'),
       (19337, 'United Colors of Benetton Men Stripes Black Jacket');

Step 3: Define the pipeline

Create a pipeline that connects the source table to a knowledge base destination:

SELECT aidb.create_pipeline(
    name => 'pipeline__6545',
    source => 'source_table__6545',
    source_key_column => 'id',
    source_data_column => 'content',
    step_1 => 'KnowledgeBase',
    step_1_options => aidb.knowledge_base_config(
        model => 'dummy',
        data_format => 'Text'
    )
);

Output

INFO:  Created pipeline "pipeline__6545"
INFO:  Created knowledge base "public.pipeline_pipeline__6545" for pipeline "pipeline__6545"
                               create_pipeline                            
-----------------------------------------------------------------------
 (pipeline__6545,Table,public,pipeline_pipeline__6545,source_id,value)
(1 row)

Step 4: Execute and verify

Trigger the processing engine and inspect the generated vector table.

SELECT aidb.run_pipeline('pipeline__6545');

-- View the generated embeddings
SELECT * FROM pipeline_pipeline__6545 ORDER BY id;

Output

 id | pipeline_id | source_id | part_ids |     value     
----+-------------+-----------+----------+---------------
  1 |          49 | 43941     | {0}      | [0.1,0.2,0.3]
  2 |          49 | 55018     | {0}      | [0.1,0.2,0.3]
  3 |          49 | 19337     | {0}      | [0.1,0.2,0.3]
(3 rows)

Example 2: Bulk embedding from an S3 bucket (PGFS)

This workflow is used for unstructured data stored in external cloud storage.

Step 1: Configure external storage and volume

Define the S3 bucket location and map it to an AIDB volume.

SELECT pgfs.create_storage_location(
    'bucket__6545',
    's3://noah-html-test',
    options => '{"region": "eu-central-1", "skip_signature": "true"}'
);

SELECT aidb.create_volume('volume__6545', 'bucket__6545', '/', 'Text');

Step 2: List volume content

Verify the files available for processing:

SELECT * FROM aidb.list_volume_content('volume__6545');

Output

    file_name     | size  |      last_modified      
------------------+-------+-------------------------
 Bilge Ince.html  |  8878 | 2025-02-19 09:58:29 UTC
 all_tags.html    | 11170 | 2025-02-19 09:58:29 UTC
 hello_world.html |   108 | 2025-02-19 09:58:30 UTC
 simple.html      |   662 | 2025-02-19 09:58:29 UTC
(4 rows)

Step 3: Create and run the volume pipeline

SELECT aidb.create_pipeline(
    name => 'volume_pipeline__6545',
    source => 'volume__6545',
    step_1 => 'KnowledgeBase',
    step_1_options => aidb.knowledge_base_config(
        model => 'dummy',
        data_format => 'Text'
    )
);

SELECT aidb.run_pipeline('volume_pipeline__6545');

Step 4: Verify embeddings

SELECT source_id, part_ids, value FROM pipeline_volume_pipeline__6545 ORDER BY source_id;

Output

    source_id     | part_ids |     value     
------------------+----------+---------------
 Bilge Ince.html  | {0}      | [0.1,0.2,0.3]
 all_tags.html    | {0}      | [0.1,0.2,0.3]
 hello_world.html | {0}      | [0.1,0.2,0.3]
 simple.html      | {0}      | [0.1,0.2,0.3]
(4 rows)

Examples v7

Example 1: Bulk embedding from a Postgres table

Step 1: Initialize the AI model

Step 2: Prepare the source data

Step 3: Define the pipeline

Step 4: Execute and verify

Example 2: Bulk embedding from an S3 bucket (PGFS)

Step 1: Configure external storage and volume

Step 2: List volume content

Step 3: Create and run the volume pipeline

Step 4: Verify embeddings

← Prev

↑ Up

Next →