Support matrix for AI models Innovation Release

This documentation covers the current Innovation Release of EDB Postgres AI. See also:

Hybrid Manager dual release strategy
Documentation for the current Long-term support release

AI Factory supports a variety of AI models for different use cases. The following table provides an overview of the compatibility of various AI models with AIDB and Langflow.

Hardware/Model support

Model	Type	1x H100 NVL	2x H100 NVL	1x A100 80GB PCIe	2x A100 80GB PCIe	1x H100 NVL	2x H100 NVL	1x L40S	1x RTX PRO 6000 Blackwell	2x RTX PRO 6000 Blackwell	3x RTX PRO 6000 Blackwell	4x RTX PRO 6000 Blackwell
`llama-3.1-8b-instruct`	Completion	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
`llama-3.3-nemotron-super-49b-v1`	Completion	✅	✅	❌	❌	✅	✅	❌	❌	⚠️	❌	⚠️
`llama-3.3-nemotron-super-49b-v1.5`	Completion	⚠️	⚠️	❌	⚠️	⚠️	⚠️	❌	⚠️	❌	❌	❌
`nemotron-3-nano`	Completion	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
`nvidia-nemotron-nano-9b-v2`	Completion	✅	✅	✅	✅	✅	✅	✅	✅	❌	❌	❌
`gpt-oss-120b`	Completion	✅	✅	❌	✅	✅	✅	❌	❌	✅	✅	✅
`gpt-oss-20b`	Completion	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
`llama-3.2-nemoretriever-300m-embed-v1`	Embedding (Text)	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
`llama-3.2-nv-embedqa-1b-v2`	Embedding (Text)	✅	✅	✅	✅	✅	✅	✅	❌	✅	✅	✅
`nv-embedqa-e5-v5`	Embedding (Text)	✅	✅	✅	✅	✅	✅	✅	✅	❌	✅	✅
`nvclip-vit-h-14`	Embedding (Text & Image)	✅	✅	✅	✅	✅	✅	✅	❌	✅	✅	✅
`llama-3.2-nv-rerankqa-1b-v2`	Reranking	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
`paddleocr`	OCR	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅	✅
`nemoretriever-ocr-v1`	OCR	✅	✅	✅	✅	✅	✅	✅	❌	✅	✅	✅

Note

Models that fail on multi-GPU configurations but succeed on single-GPU configurations don't support multi-GPU inference. You can still run these models by limiting
them to one GPU at deployment time.

Performance

Model	Type	1x H100 NVL	2x H100 NVL	1x A100 80GB PCIe	2x A100 80GB PCIe	1x H100 NVL	2x H100 NVL	1x L40S	1x RTX PRO 6000 Blackwell	2x RTX PRO 6000 Blackwell	3x RTX PRO 6000 Blackwell	4x RTX PRO 6000 Blackwell
`llama-3.1-8b-instruct`	Completion	TTFT: 30ms Lat: 1546ms RPS: 23.6/s	TTFT: 52ms Lat: 1397ms RPS: 24.7/s	TTFT: 23ms Lat: 3889ms RPS: 8.9/s	TTFT: 25ms Lat: 2952ms RPS: 6.9/s	TTFT: 32ms Lat: 1802ms RPS: 21.1/s	TTFT: 39ms Lat: 1373ms RPS: 27.7/s	TTFT: 36ms Lat: 3847ms RPS: 9.0/s	TTFT: 24ms Lat: 2964ms RPS: 13.1/s	TTFT: 20ms Lat: 2068ms RPS: 15.7/s	TTFT: 17ms Lat: 1627ms RPS: 16.4/s	TTFT: 18ms Lat: 2066ms RPS: 17.5/s
`llama-3.3-nemotron-super-49b-v1`	Completion	TTFT: 33ms Lat: 13434ms RPS: 3.9/s	TTFT: 27ms Lat: 7205ms RPS: 6.8/s			TTFT: 34ms Lat: 13993ms RPS: 3.8/s	TTFT: 32ms Lat: 7305ms RPS: 6.6/s			TTFT: 47ms Lat: 22440ms		TTFT: 37ms Lat: 13310ms
`llama-3.3-nemotron-super-49b-v1.5`	Completion	TTFT: 53ms Lat: 16827ms RPS: 3.1/s	TTFT: 42ms Lat: 11566ms RPS: 5.2/s		TTFT: 76ms Lat: 31344ms RPS: 0.9/s	TTFT: 54ms Lat: 17152ms RPS: 3.2/s	TTFT: 40ms Lat: 11181ms RPS: 5.1/s		TTFT: 115ms Lat: 31921ms RPS: 2.0/s
`nemotron-3-nano`	Completion	TTFT: 610ms Lat: 2975ms RPS: 5.0/s	TTFT: 924ms Lat: 3218ms RPS: 7.5/s	TTFT: 1690ms Lat: 5533ms RPS: 2.3/s	TTFT: 1430ms Lat: 4571ms RPS: 2.5/s	TTFT: 920ms Lat: 3679ms RPS: 4.6/s	TTFT: 571ms Lat: 3136ms RPS: 7.0/s	TTFT: 1529ms Lat: 5887ms RPS: 1.5/s		TTFT: 550ms Lat: 2910ms RPS: 4.7/s	TTFT: 676ms Lat: 2794ms RPS: 4.5/s	TTFT: 555ms Lat: 3003ms RPS: 6.1/s
`nvidia-nemotron-nano-9b-v2`	Completion	TTFT: 57ms Lat: 5923ms RPS: 3.0/s	TTFT: 68ms Lat: 4699ms RPS: 3.9/s	TTFT: 75ms Lat: 10126ms RPS: 1.9/s	TTFT: 83ms Lat: 8925ms RPS: 1.8/s	TTFT: 60ms Lat: 6196ms RPS: 3.0/s	TTFT: 67ms Lat: 4515ms RPS: 3.8/s		TTFT: 48ms Lat: 13249ms RPS: 2.1/s
`gpt-oss-120b`	Completion	TTFT: 606ms Lat: 3058ms RPS: 3.8/s	TTFT: 561ms Lat: 3262ms RPS: 6.5/s		TTFT: 775ms Lat: 4371ms RPS: 2.6/s	TTFT: 744ms Lat: 3577ms RPS: 3.7/s	TTFT: 523ms Lat: 3181ms RPS: 5.9/s			TTFT: 549ms Lat: 3402ms RPS: 5.0/s	TTFT: 451ms Lat: 2273ms RPS: 5.2/s	TTFT: 610ms Lat: 3336ms RPS: 6.5/s
`gpt-oss-20b`	Completion	TTFT: 729ms Lat: 2207ms RPS: 8.7/s	TTFT: 633ms Lat: 2276ms RPS: 13.7/s	TTFT: 1054ms Lat: 2962ms RPS: 6.7/s	TTFT: 1103ms Lat: 2719ms RPS: 4.5/s	TTFT: 1099ms Lat: 2669ms RPS: 8.6/s	TTFT: 1001ms Lat: 2229ms RPS: 12.4/s	TTFT: 1949ms Lat: 4494ms RPS: 4.5/s	TTFT: 1172ms Lat: 3085ms RPS: 7.6/s	TTFT: 900ms Lat: 2423ms RPS: 10.2/s	TTFT: 743ms Lat: 1757ms RPS: 10.7/s	TTFT: 964ms Lat: 2249ms RPS: 11.1/s
`llama-3.2-nemoretriever-300m-embed-v1`	Embedding (Text)	Dim: 2048 Lat: 19ms RPS: 191.4/s	Dim: 2048 Lat: 21ms RPS: 161.6/s	Dim: 2048 Lat: 15ms RPS: 284.5/s	Dim: 2048 Lat: 20ms RPS: 342.9/s	Dim: 2048 Lat: 19ms RPS: 182.8/s	Dim: 2048 Lat: 25ms RPS: 167.3/s	Dim: 2048 Lat: 18ms RPS: 310.2/s	Dim: 2048 Lat: 6ms RPS: 439.7/s	Dim: 2048 Lat: 13ms RPS: 486.1/s	Dim: 2048 Lat: 15ms RPS: 447.6/s	Dim: 2048 Lat: 17ms RPS: 476.9/s
`llama-3.2-nv-embedqa-1b-v2`	Embedding (Text)	Dim: 2048 Lat: 31ms RPS: 104.2/s	Dim: 2048 Lat: 33ms RPS: 103.6/s	Dim: 2048 Lat: 22ms RPS: 109.3/s	Dim: 2048 Lat: 21ms RPS: 120.6/s	Dim: 2048 Lat: 39ms RPS: 103.0/s	Dim: 2048 Lat: 34ms RPS: 89.9/s	Dim: 2048 Lat: 31ms RPS: 122.8/s		Dim: 2048 Lat: 15ms RPS: 202.5/s	Dim: 2048 Lat: 15ms RPS: 194.9/s	Dim: 2048 Lat: 15ms RPS: 163.6/s
`nv-embedqa-e5-v5`	Embedding (Text)	Dim: 1024 Lat: 36ms RPS: 62.2/s	Dim: 1024 Lat: 38ms RPS: 60.2/s	Dim: 1024 Lat: 20ms RPS: 117.8/s	Dim: 1024 Lat: 19ms RPS: 133.0/s	Dim: 1024 Lat: 32ms RPS: 57.0/s	Dim: 1024 Lat: 45ms RPS: 51.8/s	Dim: 1024 Lat: 27ms RPS: 114.7/s	Dim: 1024 Lat: 21ms RPS: 134.7/s		Dim: 1024 Lat: 18ms RPS: 136.8/s	Dim: 1024 Lat: 18ms RPS: 147.1/s
`nvclip-vit-h-14`	Embedding (Text & Image)	Dim: 1024 ILat: 52ms IRPS: 33.7/s Lat: 200ms RPS: 6.0/s	Dim: 1024 ILat: 64ms IRPS: 41.6/s Lat: 170ms RPS: 14.0/s	Dim: 1024 ILat: 25ms IRPS: 33.3/s Lat: 93ms RPS: 11.5/s	Dim: 1024 ILat: 26ms IRPS: 20.1/s Lat: 103ms RPS: 22.0/s	Dim: 1024 ILat: 53ms IRPS: 35.4/s Lat: 164ms RPS: 7.1/s	Dim: 1024 ILat: 62ms IRPS: 37.0/s Lat: 180ms RPS: 13.7/s	Dim: 1024 ILat: 31ms IRPS: 72.1/s Lat: 144ms RPS: 8.0/s		Dim: 1024 ILat: 57ms IRPS: 20.9/s Lat: 88ms RPS: 24.4/s	Dim: 1024 ILat: 555ms IRPS: 17.1/s Lat: 87ms RPS: 35.5/s	Dim: 1024 ILat: 1046ms IRPS: 17.0/s Lat: 96ms RPS: 47.8/s
`llama-3.2-nv-rerankqa-1b-v2`	Reranking	Lat: 26ms RPS: 694.2/s	Lat: 26ms RPS: 733.9/s	Lat: 16ms RPS: 502.0/s	Lat: 16ms RPS: 569.8/s	Lat: 30ms RPS: 676.4/s	Lat: 33ms RPS: 478.8/s	Lat: 23ms RPS: 507.9/s	Lat: 13ms RPS: 584.6/s	Lat: 13ms RPS: 240.7/s	Lat: 12ms RPS: 689.0/s	Lat: 13ms RPS: 639.1/s
`paddleocr`	OCR	Lat: 115ms RPS: 21.3/s	Lat: 119ms RPS: 40.8/s	Lat: 60ms RPS: 45.2/s	Lat: 61ms RPS: 75.8/s	Lat: 124ms RPS: 16.6/s	Lat: 118ms RPS: 40.4/s	Lat: 93ms RPS: 27.6/s	Lat: 28ms RPS: 38.5/s	Lat: 81ms RPS: 116.6/s	Lat: 136ms RPS: 181.5/s	Lat: 188ms RPS: 212.3/s
`nemoretriever-ocr-v1`	OCR	Lat: 43ms RPS: 23.8/s	Lat: 60ms RPS: 25.9/s	Lat: 33ms RPS: 32.8/s	Lat: 32ms RPS: 33.0/s	Lat: 55ms RPS: 26.4/s	Lat: 51ms RPS: 30.4/s	Lat: 43ms RPS: 30.2/s		Lat: 26ms RPS: 45.2/s	Lat: 26ms RPS: 44.4/s	Lat: 26ms RPS: 45.1/s

Model/AIDB & Langflow support

Model	Type	AIDB supported	Langflow supported
`llama-3.1-8b-instruct`	Completion	✅	✅chat ✅agentic
`llama-3.3-nemotron-super-49b-v1`	Completion	✅	✅chat ✅agentic
`llama-3.3-nemotron-super-49b-v1.5`	Completion	✅	✅chat ✅agentic
`nemotron-3-nano`	Completion	✅	✅chat ✅agentic
`nvidia-nemotron-nano-9b-v2`	Completion	✅	✅chat ✅agentic
`gpt-oss-120b`	Completion	✅	✅chat ✅agentic
`gpt-oss-20b`	Completion	✅	✅chat ✅agentic
`llama-3.2-nemoretriever-300m-embed-v1`	Embedding (Text)	✅	✅embeddings
`llama-3.2-nv-embedqa-1b-v2`	Embedding (Text)	✅	✅embeddings
`nv-embedqa-e5-v5`	Embedding (Text)	✅	✅embeddings
`nvclip-vit-h-14`	Embedding (Text & Image)	✅	—
`llama-3.2-nv-rerankqa-1b-v2`	Reranking	✅	—
`paddleocr`	OCR	✅	—
`nemoretriever-ocr-v1`	OCR	✅	—

Test data & methodology

Test type	Data source	Avg input size	Input range	Latency requests	Throughput requests	Concurrency
Completion	Alpaca instructions + input	~19 tokens (77 chars)	29–234 chars	30	200	100
Embedding (Text)	Alpaca instructions + output	~77 tokens (308 chars)	46–993 chars	30	200	100
Embedding (Image)	Test image (PNG)	10.3 KB	—	100	200	100
Reranking	5 hardcoded passages	—	—	100	200	100
OCR	Test image (PNG)	10.3 KB	—	100	200	100

Completion inputs: Instruction and input fields from the Stanford Alpaca dataset.
Embedding inputs: Instruction and output fields concatenated for longer, more realistic texts (up to ~1024 tokens).
Completion tests: Responses are limited to a maximum of 1024 tokens, streaming enabled for latency measurement (TTFT + full request).
Latency: Sequential requests measuring per-request response time.
Throughput: Concurrent requests measuring requests per second (RPS).

Legend

Support table: ✅ = all tests passed, ⚠️ = model runs but some tests failed (partial success), ❌ = failed to start (container error), — = not tested

Performance table:

Abbr	Meaning
TTFT	Time to first token — avg latency until the first streamed token arrives (ms)
Lat	Request latency — avg end-to-end time for a full sequential request (ms)
RPS	Requests per second — throughput under concurrent load
Dim	Embedding vector dimensions
ILat	Image request latency — same as Lat but for image inputs (ms)
IRPS	Image requests per second — same as RPS but for image inputs

Hardware Configurations

2x H100 NVL — INTEL(R) XEON(R) GOLD 6548Y+ (128 cores), 1007 GB RAM
2x A100 80GB PCIe — AMD EPYC 7773X 64-Core Processor (30 cores), 147 GB RAM
2x L40S — AMD EPYC 9354 32-Core Processor (128 cores), 503 GB RAM
4x RTX PRO 6000 Blackwell Server Edition — AMD EPYC 9355 32-Core Processor (60 cores), 566 GB RAM

Where

✅ means test passed.
❌ means test failed.
- means not tested.

← Prev

Gen AI Factory LLM Architecture (Library and Serving Overview)

↑ Up

AI Factory Models

Deployment Overview

Support matrix for AI models Innovation Release

Hardware/Model support

Note

Performance

Model/AIDB & Langflow support

Test data & methodology

Legend

Hardware Configurations

← Prev

↑ Up

Next →