Support matrix for AI models Innovation Release
This documentation covers the current Innovation Release of
EDB Postgres AI. See also:
- Hybrid Manager dual release strategy
- Documentation for the current Long-term support release
AI Factory supports a variety of AI models for different use cases. The following table provides an overview of the compatibility of various AI models with AIDB and Langflow.
Hardware/Model support
| Model | Type | 1x H100 NVL | 2x H100 NVL | 1x A100 80GB PCIe | 2x A100 80GB PCIe | 1x H100 NVL | 2x H100 NVL | 1x L40S | 1x RTX PRO 6000 Blackwell | 2x RTX PRO 6000 Blackwell | 3x RTX PRO 6000 Blackwell | 4x RTX PRO 6000 Blackwell |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
llama-3.1-8b-instruct | Completion | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
llama-3.3-nemotron-super-49b-v1 | Completion | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | ⚠️ | ❌ | ⚠️ |
llama-3.3-nemotron-super-49b-v1.5 | Completion | ⚠️ | ⚠️ | ❌ | ⚠️ | ⚠️ | ⚠️ | ❌ | ⚠️ | ❌ | ❌ | ❌ |
nemotron-3-nano | Completion | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
nvidia-nemotron-nano-9b-v2 | Completion | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
gpt-oss-120b | Completion | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ |
gpt-oss-20b | Completion | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
llama-3.2-nemoretriever-300m-embed-v1 | Embedding (Text) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
llama-3.2-nv-embedqa-1b-v2 | Embedding (Text) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
nv-embedqa-e5-v5 | Embedding (Text) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ |
nvclip-vit-h-14 | Embedding (Text & Image) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
llama-3.2-nv-rerankqa-1b-v2 | Reranking | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
paddleocr | OCR | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
nemoretriever-ocr-v1 | OCR | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
Note
Models that fail on multi-GPU configurations but succeed on single-GPU configurations don't support multi-GPU inference. You can still run these models by limiting
them to one GPU at deployment time.
Performance
| Model | Type | 1x H100 NVL | 2x H100 NVL | 1x A100 80GB PCIe | 2x A100 80GB PCIe | 1x H100 NVL | 2x H100 NVL | 1x L40S | 1x RTX PRO 6000 Blackwell | 2x RTX PRO 6000 Blackwell | 3x RTX PRO 6000 Blackwell | 4x RTX PRO 6000 Blackwell |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
llama-3.1-8b-instruct | Completion | TTFT: 30ms Lat: 1546ms RPS: 23.6/s | TTFT: 52ms Lat: 1397ms RPS: 24.7/s | TTFT: 23ms Lat: 3889ms RPS: 8.9/s | TTFT: 25ms Lat: 2952ms RPS: 6.9/s | TTFT: 32ms Lat: 1802ms RPS: 21.1/s | TTFT: 39ms Lat: 1373ms RPS: 27.7/s | TTFT: 36ms Lat: 3847ms RPS: 9.0/s | TTFT: 24ms Lat: 2964ms RPS: 13.1/s | TTFT: 20ms Lat: 2068ms RPS: 15.7/s | TTFT: 17ms Lat: 1627ms RPS: 16.4/s | TTFT: 18ms Lat: 2066ms RPS: 17.5/s |
llama-3.3-nemotron-super-49b-v1 | Completion | TTFT: 33ms Lat: 13434ms RPS: 3.9/s | TTFT: 27ms Lat: 7205ms RPS: 6.8/s | TTFT: 34ms Lat: 13993ms RPS: 3.8/s | TTFT: 32ms Lat: 7305ms RPS: 6.6/s | TTFT: 47ms Lat: 22440ms | TTFT: 37ms Lat: 13310ms | |||||
llama-3.3-nemotron-super-49b-v1.5 | Completion | TTFT: 53ms Lat: 16827ms RPS: 3.1/s | TTFT: 42ms Lat: 11566ms RPS: 5.2/s | TTFT: 76ms Lat: 31344ms RPS: 0.9/s | TTFT: 54ms Lat: 17152ms RPS: 3.2/s | TTFT: 40ms Lat: 11181ms RPS: 5.1/s | TTFT: 115ms Lat: 31921ms RPS: 2.0/s | |||||
nemotron-3-nano | Completion | TTFT: 610ms Lat: 2975ms RPS: 5.0/s | TTFT: 924ms Lat: 3218ms RPS: 7.5/s | TTFT: 1690ms Lat: 5533ms RPS: 2.3/s | TTFT: 1430ms Lat: 4571ms RPS: 2.5/s | TTFT: 920ms Lat: 3679ms RPS: 4.6/s | TTFT: 571ms Lat: 3136ms RPS: 7.0/s | TTFT: 1529ms Lat: 5887ms RPS: 1.5/s | TTFT: 550ms Lat: 2910ms RPS: 4.7/s | TTFT: 676ms Lat: 2794ms RPS: 4.5/s | TTFT: 555ms Lat: 3003ms RPS: 6.1/s | |
nvidia-nemotron-nano-9b-v2 | Completion | TTFT: 57ms Lat: 5923ms RPS: 3.0/s | TTFT: 68ms Lat: 4699ms RPS: 3.9/s | TTFT: 75ms Lat: 10126ms RPS: 1.9/s | TTFT: 83ms Lat: 8925ms RPS: 1.8/s | TTFT: 60ms Lat: 6196ms RPS: 3.0/s | TTFT: 67ms Lat: 4515ms RPS: 3.8/s | TTFT: 48ms Lat: 13249ms RPS: 2.1/s | ||||
gpt-oss-120b | Completion | TTFT: 606ms Lat: 3058ms RPS: 3.8/s | TTFT: 561ms Lat: 3262ms RPS: 6.5/s | TTFT: 775ms Lat: 4371ms RPS: 2.6/s | TTFT: 744ms Lat: 3577ms RPS: 3.7/s | TTFT: 523ms Lat: 3181ms RPS: 5.9/s | TTFT: 549ms Lat: 3402ms RPS: 5.0/s | TTFT: 451ms Lat: 2273ms RPS: 5.2/s | TTFT: 610ms Lat: 3336ms RPS: 6.5/s | |||
gpt-oss-20b | Completion | TTFT: 729ms Lat: 2207ms RPS: 8.7/s | TTFT: 633ms Lat: 2276ms RPS: 13.7/s | TTFT: 1054ms Lat: 2962ms RPS: 6.7/s | TTFT: 1103ms Lat: 2719ms RPS: 4.5/s | TTFT: 1099ms Lat: 2669ms RPS: 8.6/s | TTFT: 1001ms Lat: 2229ms RPS: 12.4/s | TTFT: 1949ms Lat: 4494ms RPS: 4.5/s | TTFT: 1172ms Lat: 3085ms RPS: 7.6/s | TTFT: 900ms Lat: 2423ms RPS: 10.2/s | TTFT: 743ms Lat: 1757ms RPS: 10.7/s | TTFT: 964ms Lat: 2249ms RPS: 11.1/s |
llama-3.2-nemoretriever-300m-embed-v1 | Embedding (Text) | Dim: 2048 Lat: 19ms RPS: 191.4/s | Dim: 2048 Lat: 21ms RPS: 161.6/s | Dim: 2048 Lat: 15ms RPS: 284.5/s | Dim: 2048 Lat: 20ms RPS: 342.9/s | Dim: 2048 Lat: 19ms RPS: 182.8/s | Dim: 2048 Lat: 25ms RPS: 167.3/s | Dim: 2048 Lat: 18ms RPS: 310.2/s | Dim: 2048 Lat: 6ms RPS: 439.7/s | Dim: 2048 Lat: 13ms RPS: 486.1/s | Dim: 2048 Lat: 15ms RPS: 447.6/s | Dim: 2048 Lat: 17ms RPS: 476.9/s |
llama-3.2-nv-embedqa-1b-v2 | Embedding (Text) | Dim: 2048 Lat: 31ms RPS: 104.2/s | Dim: 2048 Lat: 33ms RPS: 103.6/s | Dim: 2048 Lat: 22ms RPS: 109.3/s | Dim: 2048 Lat: 21ms RPS: 120.6/s | Dim: 2048 Lat: 39ms RPS: 103.0/s | Dim: 2048 Lat: 34ms RPS: 89.9/s | Dim: 2048 Lat: 31ms RPS: 122.8/s | Dim: 2048 Lat: 15ms RPS: 202.5/s | Dim: 2048 Lat: 15ms RPS: 194.9/s | Dim: 2048 Lat: 15ms RPS: 163.6/s | |
nv-embedqa-e5-v5 | Embedding (Text) | Dim: 1024 Lat: 36ms RPS: 62.2/s | Dim: 1024 Lat: 38ms RPS: 60.2/s | Dim: 1024 Lat: 20ms RPS: 117.8/s | Dim: 1024 Lat: 19ms RPS: 133.0/s | Dim: 1024 Lat: 32ms RPS: 57.0/s | Dim: 1024 Lat: 45ms RPS: 51.8/s | Dim: 1024 Lat: 27ms RPS: 114.7/s | Dim: 1024 Lat: 21ms RPS: 134.7/s | Dim: 1024 Lat: 18ms RPS: 136.8/s | Dim: 1024 Lat: 18ms RPS: 147.1/s | |
nvclip-vit-h-14 | Embedding (Text & Image) | Dim: 1024 ILat: 52ms IRPS: 33.7/s Lat: 200ms RPS: 6.0/s | Dim: 1024 ILat: 64ms IRPS: 41.6/s Lat: 170ms RPS: 14.0/s | Dim: 1024 ILat: 25ms IRPS: 33.3/s Lat: 93ms RPS: 11.5/s | Dim: 1024 ILat: 26ms IRPS: 20.1/s Lat: 103ms RPS: 22.0/s | Dim: 1024 ILat: 53ms IRPS: 35.4/s Lat: 164ms RPS: 7.1/s | Dim: 1024 ILat: 62ms IRPS: 37.0/s Lat: 180ms RPS: 13.7/s | Dim: 1024 ILat: 31ms IRPS: 72.1/s Lat: 144ms RPS: 8.0/s | Dim: 1024 ILat: 57ms IRPS: 20.9/s Lat: 88ms RPS: 24.4/s | Dim: 1024 ILat: 555ms IRPS: 17.1/s Lat: 87ms RPS: 35.5/s | Dim: 1024 ILat: 1046ms IRPS: 17.0/s Lat: 96ms RPS: 47.8/s | |
llama-3.2-nv-rerankqa-1b-v2 | Reranking | Lat: 26ms RPS: 694.2/s | Lat: 26ms RPS: 733.9/s | Lat: 16ms RPS: 502.0/s | Lat: 16ms RPS: 569.8/s | Lat: 30ms RPS: 676.4/s | Lat: 33ms RPS: 478.8/s | Lat: 23ms RPS: 507.9/s | Lat: 13ms RPS: 584.6/s | Lat: 13ms RPS: 240.7/s | Lat: 12ms RPS: 689.0/s | Lat: 13ms RPS: 639.1/s |
paddleocr | OCR | Lat: 115ms RPS: 21.3/s | Lat: 119ms RPS: 40.8/s | Lat: 60ms RPS: 45.2/s | Lat: 61ms RPS: 75.8/s | Lat: 124ms RPS: 16.6/s | Lat: 118ms RPS: 40.4/s | Lat: 93ms RPS: 27.6/s | Lat: 28ms RPS: 38.5/s | Lat: 81ms RPS: 116.6/s | Lat: 136ms RPS: 181.5/s | Lat: 188ms RPS: 212.3/s |
nemoretriever-ocr-v1 | OCR | Lat: 43ms RPS: 23.8/s | Lat: 60ms RPS: 25.9/s | Lat: 33ms RPS: 32.8/s | Lat: 32ms RPS: 33.0/s | Lat: 55ms RPS: 26.4/s | Lat: 51ms RPS: 30.4/s | Lat: 43ms RPS: 30.2/s | Lat: 26ms RPS: 45.2/s | Lat: 26ms RPS: 44.4/s | Lat: 26ms RPS: 45.1/s |
Model/AIDB & Langflow support
| Model | Type | AIDB supported | Langflow supported |
|---|---|---|---|
llama-3.1-8b-instruct | Completion | ✅ | ✅chat ✅agentic |
llama-3.3-nemotron-super-49b-v1 | Completion | ✅ | ✅chat ✅agentic |
llama-3.3-nemotron-super-49b-v1.5 | Completion | ✅ | ✅chat ✅agentic |
nemotron-3-nano | Completion | ✅ | ✅chat ✅agentic |
nvidia-nemotron-nano-9b-v2 | Completion | ✅ | ✅chat ✅agentic |
gpt-oss-120b | Completion | ✅ | ✅chat ✅agentic |
gpt-oss-20b | Completion | ✅ | ✅chat ✅agentic |
llama-3.2-nemoretriever-300m-embed-v1 | Embedding (Text) | ✅ | ✅embeddings |
llama-3.2-nv-embedqa-1b-v2 | Embedding (Text) | ✅ | ✅embeddings |
nv-embedqa-e5-v5 | Embedding (Text) | ✅ | ✅embeddings |
nvclip-vit-h-14 | Embedding (Text & Image) | ✅ | — |
llama-3.2-nv-rerankqa-1b-v2 | Reranking | ✅ | — |
paddleocr | OCR | ✅ | — |
nemoretriever-ocr-v1 | OCR | ✅ | — |
Test data & methodology
| Test type | Data source | Avg input size | Input range | Latency requests | Throughput requests | Concurrency |
|---|---|---|---|---|---|---|
| Completion | Alpaca instructions + input | ~19 tokens (77 chars) | 29–234 chars | 30 | 200 | 100 |
| Embedding (Text) | Alpaca instructions + output | ~77 tokens (308 chars) | 46–993 chars | 30 | 200 | 100 |
| Embedding (Image) | Test image (PNG) | 10.3 KB | — | 100 | 200 | 100 |
| Reranking | 5 hardcoded passages | — | — | 100 | 200 | 100 |
| OCR | Test image (PNG) | 10.3 KB | — | 100 | 200 | 100 |
- Completion inputs: Instruction and input fields from the Stanford Alpaca dataset.
- Embedding inputs: Instruction and output fields concatenated for longer, more realistic texts (up to ~1024 tokens).
- Completion tests: Responses are limited to a maximum of 1024 tokens, streaming enabled for latency measurement (TTFT + full request).
- Latency: Sequential requests measuring per-request response time.
- Throughput: Concurrent requests measuring requests per second (RPS).
Legend
Support table: ✅ = all tests passed, ⚠️ = model runs but some tests failed (partial success), ❌ = failed to start (container error), — = not tested
Performance table:
| Abbr | Meaning |
|---|---|
| TTFT | Time to first token — avg latency until the first streamed token arrives (ms) |
| Lat | Request latency — avg end-to-end time for a full sequential request (ms) |
| RPS | Requests per second — throughput under concurrent load |
| Dim | Embedding vector dimensions |
| ILat | Image request latency — same as Lat but for image inputs (ms) |
| IRPS | Image requests per second — same as RPS but for image inputs |
Hardware Configurations
- 2x H100 NVL — INTEL(R) XEON(R) GOLD 6548Y+ (128 cores), 1007 GB RAM
- 2x A100 80GB PCIe — AMD EPYC 7773X 64-Core Processor (30 cores), 147 GB RAM
- 2x L40S — AMD EPYC 9354 32-Core Processor (128 cores), 503 GB RAM
- 4x RTX PRO 6000 Blackwell Server Edition — AMD EPYC 9355 32-Core Processor (60 cores), 566 GB RAM
Where
- ✅ means test passed.
- ❌ means test failed.
- means not tested.