Configuring external storage v7
AIDB pipelines can read from two types of data source:
- Postgres tables — reference a table directly by name using the
sourceparameter inaidb.create_pipeline(). See Creating pipelines. - External storage volumes — connect S3-compatible object stores, Google Cloud Storage, Azure, or local file systems via PGFS. The rest of this page covers how to set this up.
External storage is accessed through the Postgres File System (PGFS) extension, which maps external storage into Postgres as storage locations. AIDB then wraps each storage location in a volume that pipelines reference by name.
How it works
Connecting external storage to an AIDB pipeline involves two objects:
- PGFS storage location — Defines the external storage provider: its URI, credentials, and connection options. Created with pgfs.create_storage_location().
- AIDB volume — Connects a PGFS storage location to AIDB. Specifies the data format (Text, Image, Pdf) and an optional sub-path within the storage location. Created with
aidb.create_volume().
Once a volume exists, reference it as the source in aidb.create_pipeline() exactly as you would a Postgres table name.
Note
The PGFS extension must be installed before creating storage locations. See Configuring AIDB.
Step 1: Create a storage location
Use pgfs.create_storage_location() to define the connection to external storage. The uri parameter identifies the storage backend and path; the options JSONB object carries provider-specific settings such as region and credentials.
S3-compatible object store
-- Private S3 bucket with credentials SELECT pgfs.create_storage_location( name => 'my_s3_location', uri => 's3://my-bucket/my-folder', options => '{"region": "us-east-1", "access_key_id": "<key>", "secret_access_key": "<secret>"}' ); -- Public S3 bucket (no credentials required) SELECT pgfs.create_storage_location( name => 'my_public_bucket', uri => 's3://aidb-rag-app', options => '{"region": "eu-central-1", "skip_signature": "true"}' );
Local file system
For local file system access, declare the allowed base paths in postgresql.conf before creating the storage location. PGFS restricts access to these paths for security.
# postgresql.conf pgfs.allowed_local_fs_paths = '/tmp/pgfs'
After restarting Postgres, create the storage location using a file:// URI:
SELECT pgfs.create_storage_location( name => 'local_tmp_pgfs', uri => 'file:///tmp/pgfs/' );
For full details on storage location options for S3, GCS, and Azure, see the PGFS documentation.
Step 2: Create a volume
Use aidb.create_volume() to attach a PGFS storage location to AIDB. The volume is what pipelines and SQL functions reference.
SELECT aidb.create_volume( name => 'my_volume', server => 'my_s3_location', path => '/', data_format => 'Text' );
Parameters:
| Parameter | Description |
|---|---|
name | Unique name for the volume. Used to reference it in pipelines. |
storage_location | Name of the PGFS storage location to attach. |
sub_path | Optional path within the storage location. Useful for pointing multiple volumes at different folders in the same bucket. |
data_format | The type of data in this volume. One of Text, Image, or Pdf. Pipelines use this to choose the correct parsing step. |
Note
data_format is metadata — it tells AIDB how to treat the objects, but does not filter them. Ensure the volume only contains objects of the declared format.
Example: PDFs in S3
SELECT pgfs.create_storage_location( 'pdf_bucket', 's3://my-docs-bucket', options => '{"region": "us-east-1", "access_key_id": "<key>", "secret_access_key": "<secret>"}' ); SELECT aidb.create_volume('pdf_volume', 'pdf_bucket', '/', 'Pdf');
Example: Images in a local directory
SELECT pgfs.create_storage_location('local_tmp_pgfs', 'file:///tmp/pgfs/'); SELECT aidb.create_volume('ocr_input_volume', 'local_tmp_pgfs', 'ocr_input/', 'Image');
Step 3: Use the volume as a pipeline source
Set the source parameter in aidb.create_pipeline() to the volume name. Volume sources work the same as table sources from the pipeline's perspective:
SELECT aidb.create_pipeline( name => 'my_pdf_pipeline', source => 'pdf_volume', step_1 => 'ParsePdf', step_2 => 'ChunkText', step_3 => 'KnowledgeBase', step_3_options => aidb.knowledge_base_config('bert_local', 'Text') );
See Create a pipeline for a full walkthrough.
Managing volumes
List and delete
List all volumes:
SELECT aidb.list_volumes();
Delete a volume:
SELECT aidb.delete_volume('my_volume');
Note
Deleting a PGFS storage location also deletes all volumes created on top of it.
Inspect volume contents
Use these functions to verify a volume before attaching it to a pipeline:
-- List all objects in the volume SELECT * FROM aidb.list_volume_content('my_volume'); -- Read a specific file as BYTEA SELECT aidb.read_volume_file('my_volume', 'report.pdf'); -- Read a plain text file as text SELECT convert_from( aidb.read_volume_file('my_volume', 'notes.txt'), 'utf8' );
These direct access functions are also useful for building custom SQL queries against external storage, independent of any pipeline.