Functions reference v7

Reference for AIDB standalone SQL functions that transform data directly in queries, without requiring a pipeline. For guide-style documentation and usage examples, see SQL functions.

For AI inference functions (encode_text, decode_text, rerank_text, and related), see Models reference.


aidb.chunk_text

Divides a text string into smaller, semantically coherent segments.

Parameters

ParameterTypeDescription
inputTEXTThe text to chunk.
optionsJSONBChunking configuration (see below).

Options

KeyTypeDefaultDescription
desired_lengthintegerRequiredTarget segment size. Acts as a strict upper limit if max_length is omitted.
max_lengthintegerNULLUpper bound for chunk size. Chunks extend past desired_length only to preserve semantic boundaries.
overlap_lengthinteger0Amount of content to repeat between consecutive chunks, to preserve cross-boundary context.
strategytext'chars'Chunking unit: 'chars' (character-based) or 'words' (word-based). Determines the unit for desired_length, max_length, and overlap_length.

Returns

ColumnTypeDescription
part_idintegerZero-based segment index.
chunktextThe text segment.

Example

SELECT * FROM aidb.chunk_text(
    input   => 'Long text here...',
    options => '{"desired_length": 120, "max_length": 150}'
);

aidb.parse_html

Extracts readable text from an HTML string, stripping tags while preserving structure.

Parameters

ParameterTypeDescription
htmlTEXTThe HTML string to parse.
optionsJSONBParsing configuration (see below).

Options

KeyTypeDefaultDescription
methodtext'StructuredPlaintext'Parsing method: 'StructuredPlaintext' (plain text extraction) or 'StructuredMarkdown' (Markdown-like output that retains headers and lists).

Returns

TEXT — the extracted text content.

Example

SELECT aidb.parse_html(
    html    => '<h1>Hello</h1><p>World</p>',
    options => '{"method": "StructuredPlaintext"}'
);

aidb.parse_pdf

Extracts text from binary PDF data. Returns one row per page.

Parameters

ParameterTypeDescription
bytesBYTEARaw PDF binary data.
optionsJSONBParsing configuration (see below).

Options

KeyTypeDefaultDescription
methodtext'Structured'Parsing method. Currently 'Structured' (spec-based text block extraction).
allow_partial_parsingbooleantrueWhen true, continues parsing when errors are encountered on individual pages, returning as much data as possible.

Returns

ColumnTypeDescription
part_idintegerPage index (zero-based) from which the text was extracted.
texttextExtracted text for that page.

Example

SELECT * FROM aidb.parse_pdf(
    bytes   => pg_read_binary_file('/path/to/doc.pdf')::BYTEA,
    options => '{"allow_partial_parsing": true}'
);

aidb.perform_ocr

Extracts text from image data using a registered OCR-capable model.

Parameters

ParameterTypeDescription
inputBYTEARaw binary image data.
optionsJSONBOCR configuration (see below).

Options

KeyTypeDescription
modeltextName of a registered OCR-capable model (for example, one using the nim_ocr provider).

Returns

ColumnTypeDescription
part_idintegerText block index. A single image may produce multiple rows if the provider returns multiple text segments.
texttextExtracted text for that block.

Example

SELECT * FROM aidb.perform_ocr(
    input   => pg_read_binary_file('/path/to/image.png')::BYTEA,
    options => '{"model": "my_ocr_model"}'
);

aidb.summarize_text

Generates a concise summary of a text string using a registered language model.

Parameters

ParameterTypeDescription
inputTEXTThe text to summarize.
optionsJSONBSummarization configuration (see below).

Options

KeyTypeDefaultDescription
modeltextRequiredName of a registered model that supports decode_text.
prompttext(standard prompt)Custom instruction to guide the summary style, for example 'Summarize for a 5th grader'.
chunk_configJSONBNULLChunking configuration to apply before summarization when input exceeds the model's context window. Accepts the same keys as aidb.chunk_text options.
strategytext'append'Summarization strategy: 'append' (summarize each chunk independently and concatenate) or 'reduce' (iteratively summarize until the desired length is reached).
reduction_factorinteger3Used with the 'reduce' strategy. Controls how aggressively each iteration reduces the text.

Returns

TEXT — the generated summary.

Example

SELECT aidb.summarize_text(
    input   => 'Long article text here...',
    options => '{"model": "my_t5_model"}'
);

aidb.summarize_text_aggregate

Aggregate version of aidb.summarize_text. Accumulates text from all rows in each group, then sends the combined result to the LLM for summarization. Returns one summary per GROUP BY group. Empty and NULL rows are skipped.

Parameters

ParameterTypeRequiredDescription
inputTEXTYesText column from each row. Empty/NULL rows are skipped.
optionsJSONYesConfiguration object built with aidb.summarize_text_config(). Must contain model at minimum.

Options

Accepts the same options as aidb.summarize_text (see above). Note that options is required for the aggregate, not optional.

Returns

TEXT — the summary for the group.

Usage

SELECT category,
       aidb.summarize_text_aggregate(
           text_column,
           aidb.summarize_text_config('my_t5_model')::json ORDER BY id
       ) AS summary
FROM my_table
GROUP BY category;

Parameter defaults quick reference

FunctionParameterSQL defaultRuntime default
summarize_textoptions'{}'Must include model
summarize_text_aggregateinput— (required)
summarize_text_aggregateoptions— (required)
summarize_text_configmodel— (required)
summarize_text_configchunk_configNULLNo chunking
summarize_text_configpromptNULLStandard summarize prompt
summarize_text_configstrategyNULL'append'
summarize_text_configreduction_factorNULL3
summarize_text_configinference_configNULLProvider defaults
chunk_text_configdesired_length— (required)
chunk_text_configmax_lengthNULLSame as desired_length
chunk_text_configoverlap_lengthNULL0
chunk_text_configstrategyNULL'chars'