Support matrix for AI models Innovation Release

AI Factory supports a variety of AI models for different use cases. The following table provides an overview of the compatibility of various AI models with AIDB and Langflow.

Hardware/Model support

ModelType1x H100 NVL2x H100 NVL1x A100 80GB PCIe2x A100 80GB PCIe1x H100 NVL2x H100 NVL1x L40S1x RTX PRO 6000 Blackwell2x RTX PRO 6000 Blackwell3x RTX PRO 6000 Blackwell4x RTX PRO 6000 Blackwell
llama-3.1-8b-instructCompletion
llama-3.3-nemotron-super-49b-v1Completion⚠️⚠️
llama-3.3-nemotron-super-49b-v1.5Completion⚠️⚠️⚠️⚠️⚠️⚠️
nemotron-3-nanoCompletion
nvidia-nemotron-nano-9b-v2Completion
gpt-oss-120bCompletion
gpt-oss-20bCompletion
llama-3.2-nemoretriever-300m-embed-v1Embedding (Text)
llama-3.2-nv-embedqa-1b-v2Embedding (Text)
nv-embedqa-e5-v5Embedding (Text)
nvclip-vit-h-14Embedding (Text & Image)
llama-3.2-nv-rerankqa-1b-v2Reranking
paddleocrOCR
nemoretriever-ocr-v1OCR
Note

Models that fail on multi-GPU configurations but succeed on single-GPU configurations don't support multi-GPU inference. You can still run these models by limiting
them to one GPU at deployment time.

Performance

ModelType1x H100 NVL2x H100 NVL1x A100 80GB PCIe2x A100 80GB PCIe1x H100 NVL2x H100 NVL1x L40S1x RTX PRO 6000 Blackwell2x RTX PRO 6000 Blackwell3x RTX PRO 6000 Blackwell4x RTX PRO 6000 Blackwell
llama-3.1-8b-instructCompletionTTFT: 30ms
Lat: 1546ms
RPS: 23.6/s
TTFT: 52ms
Lat: 1397ms
RPS: 24.7/s
TTFT: 23ms
Lat: 3889ms
RPS: 8.9/s
TTFT: 25ms
Lat: 2952ms
RPS: 6.9/s
TTFT: 32ms
Lat: 1802ms
RPS: 21.1/s
TTFT: 39ms
Lat: 1373ms
RPS: 27.7/s
TTFT: 36ms
Lat: 3847ms
RPS: 9.0/s
TTFT: 24ms
Lat: 2964ms
RPS: 13.1/s
TTFT: 20ms
Lat: 2068ms
RPS: 15.7/s
TTFT: 17ms
Lat: 1627ms
RPS: 16.4/s
TTFT: 18ms
Lat: 2066ms
RPS: 17.5/s
llama-3.3-nemotron-super-49b-v1CompletionTTFT: 33ms
Lat: 13434ms
RPS: 3.9/s
TTFT: 27ms
Lat: 7205ms
RPS: 6.8/s
TTFT: 34ms
Lat: 13993ms
RPS: 3.8/s
TTFT: 32ms
Lat: 7305ms
RPS: 6.6/s
TTFT: 47ms
Lat: 22440ms
TTFT: 37ms
Lat: 13310ms
llama-3.3-nemotron-super-49b-v1.5CompletionTTFT: 53ms
Lat: 16827ms
RPS: 3.1/s
TTFT: 42ms
Lat: 11566ms
RPS: 5.2/s
TTFT: 76ms
Lat: 31344ms
RPS: 0.9/s
TTFT: 54ms
Lat: 17152ms
RPS: 3.2/s
TTFT: 40ms
Lat: 11181ms
RPS: 5.1/s
TTFT: 115ms
Lat: 31921ms
RPS: 2.0/s
nemotron-3-nanoCompletionTTFT: 610ms
Lat: 2975ms
RPS: 5.0/s
TTFT: 924ms
Lat: 3218ms
RPS: 7.5/s
TTFT: 1690ms
Lat: 5533ms
RPS: 2.3/s
TTFT: 1430ms
Lat: 4571ms
RPS: 2.5/s
TTFT: 920ms
Lat: 3679ms
RPS: 4.6/s
TTFT: 571ms
Lat: 3136ms
RPS: 7.0/s
TTFT: 1529ms
Lat: 5887ms
RPS: 1.5/s
TTFT: 550ms
Lat: 2910ms
RPS: 4.7/s
TTFT: 676ms
Lat: 2794ms
RPS: 4.5/s
TTFT: 555ms
Lat: 3003ms
RPS: 6.1/s
nvidia-nemotron-nano-9b-v2CompletionTTFT: 57ms
Lat: 5923ms
RPS: 3.0/s
TTFT: 68ms
Lat: 4699ms
RPS: 3.9/s
TTFT: 75ms
Lat: 10126ms
RPS: 1.9/s
TTFT: 83ms
Lat: 8925ms
RPS: 1.8/s
TTFT: 60ms
Lat: 6196ms
RPS: 3.0/s
TTFT: 67ms
Lat: 4515ms
RPS: 3.8/s
TTFT: 48ms
Lat: 13249ms
RPS: 2.1/s
gpt-oss-120bCompletionTTFT: 606ms
Lat: 3058ms
RPS: 3.8/s
TTFT: 561ms
Lat: 3262ms
RPS: 6.5/s
TTFT: 775ms
Lat: 4371ms
RPS: 2.6/s
TTFT: 744ms
Lat: 3577ms
RPS: 3.7/s
TTFT: 523ms
Lat: 3181ms
RPS: 5.9/s
TTFT: 549ms
Lat: 3402ms
RPS: 5.0/s
TTFT: 451ms
Lat: 2273ms
RPS: 5.2/s
TTFT: 610ms
Lat: 3336ms
RPS: 6.5/s
gpt-oss-20bCompletionTTFT: 729ms
Lat: 2207ms
RPS: 8.7/s
TTFT: 633ms
Lat: 2276ms
RPS: 13.7/s
TTFT: 1054ms
Lat: 2962ms
RPS: 6.7/s
TTFT: 1103ms
Lat: 2719ms
RPS: 4.5/s
TTFT: 1099ms
Lat: 2669ms
RPS: 8.6/s
TTFT: 1001ms
Lat: 2229ms
RPS: 12.4/s
TTFT: 1949ms
Lat: 4494ms
RPS: 4.5/s
TTFT: 1172ms
Lat: 3085ms
RPS: 7.6/s
TTFT: 900ms
Lat: 2423ms
RPS: 10.2/s
TTFT: 743ms
Lat: 1757ms
RPS: 10.7/s
TTFT: 964ms
Lat: 2249ms
RPS: 11.1/s
llama-3.2-nemoretriever-300m-embed-v1Embedding (Text)Dim: 2048
Lat: 19ms
RPS: 191.4/s
Dim: 2048
Lat: 21ms
RPS: 161.6/s
Dim: 2048
Lat: 15ms
RPS: 284.5/s
Dim: 2048
Lat: 20ms
RPS: 342.9/s
Dim: 2048
Lat: 19ms
RPS: 182.8/s
Dim: 2048
Lat: 25ms
RPS: 167.3/s
Dim: 2048
Lat: 18ms
RPS: 310.2/s
Dim: 2048
Lat: 6ms
RPS: 439.7/s
Dim: 2048
Lat: 13ms
RPS: 486.1/s
Dim: 2048
Lat: 15ms
RPS: 447.6/s
Dim: 2048
Lat: 17ms
RPS: 476.9/s
llama-3.2-nv-embedqa-1b-v2Embedding (Text)Dim: 2048
Lat: 31ms
RPS: 104.2/s
Dim: 2048
Lat: 33ms
RPS: 103.6/s
Dim: 2048
Lat: 22ms
RPS: 109.3/s
Dim: 2048
Lat: 21ms
RPS: 120.6/s
Dim: 2048
Lat: 39ms
RPS: 103.0/s
Dim: 2048
Lat: 34ms
RPS: 89.9/s
Dim: 2048
Lat: 31ms
RPS: 122.8/s
Dim: 2048
Lat: 15ms
RPS: 202.5/s
Dim: 2048
Lat: 15ms
RPS: 194.9/s
Dim: 2048
Lat: 15ms
RPS: 163.6/s
nv-embedqa-e5-v5Embedding (Text)Dim: 1024
Lat: 36ms
RPS: 62.2/s
Dim: 1024
Lat: 38ms
RPS: 60.2/s
Dim: 1024
Lat: 20ms
RPS: 117.8/s
Dim: 1024
Lat: 19ms
RPS: 133.0/s
Dim: 1024
Lat: 32ms
RPS: 57.0/s
Dim: 1024
Lat: 45ms
RPS: 51.8/s
Dim: 1024
Lat: 27ms
RPS: 114.7/s
Dim: 1024
Lat: 21ms
RPS: 134.7/s
Dim: 1024
Lat: 18ms
RPS: 136.8/s
Dim: 1024
Lat: 18ms
RPS: 147.1/s
nvclip-vit-h-14Embedding (Text & Image)Dim: 1024
ILat: 52ms
IRPS: 33.7/s
Lat: 200ms
RPS: 6.0/s
Dim: 1024
ILat: 64ms
IRPS: 41.6/s
Lat: 170ms
RPS: 14.0/s
Dim: 1024
ILat: 25ms
IRPS: 33.3/s
Lat: 93ms
RPS: 11.5/s
Dim: 1024
ILat: 26ms
IRPS: 20.1/s
Lat: 103ms
RPS: 22.0/s
Dim: 1024
ILat: 53ms
IRPS: 35.4/s
Lat: 164ms
RPS: 7.1/s
Dim: 1024
ILat: 62ms
IRPS: 37.0/s
Lat: 180ms
RPS: 13.7/s
Dim: 1024
ILat: 31ms
IRPS: 72.1/s
Lat: 144ms
RPS: 8.0/s
Dim: 1024
ILat: 57ms
IRPS: 20.9/s
Lat: 88ms
RPS: 24.4/s
Dim: 1024
ILat: 555ms
IRPS: 17.1/s
Lat: 87ms
RPS: 35.5/s
Dim: 1024
ILat: 1046ms
IRPS: 17.0/s
Lat: 96ms
RPS: 47.8/s
llama-3.2-nv-rerankqa-1b-v2RerankingLat: 26ms
RPS: 694.2/s
Lat: 26ms
RPS: 733.9/s
Lat: 16ms
RPS: 502.0/s
Lat: 16ms
RPS: 569.8/s
Lat: 30ms
RPS: 676.4/s
Lat: 33ms
RPS: 478.8/s
Lat: 23ms
RPS: 507.9/s
Lat: 13ms
RPS: 584.6/s
Lat: 13ms
RPS: 240.7/s
Lat: 12ms
RPS: 689.0/s
Lat: 13ms
RPS: 639.1/s
paddleocrOCRLat: 115ms
RPS: 21.3/s
Lat: 119ms
RPS: 40.8/s
Lat: 60ms
RPS: 45.2/s
Lat: 61ms
RPS: 75.8/s
Lat: 124ms
RPS: 16.6/s
Lat: 118ms
RPS: 40.4/s
Lat: 93ms
RPS: 27.6/s
Lat: 28ms
RPS: 38.5/s
Lat: 81ms
RPS: 116.6/s
Lat: 136ms
RPS: 181.5/s
Lat: 188ms
RPS: 212.3/s
nemoretriever-ocr-v1OCRLat: 43ms
RPS: 23.8/s
Lat: 60ms
RPS: 25.9/s
Lat: 33ms
RPS: 32.8/s
Lat: 32ms
RPS: 33.0/s
Lat: 55ms
RPS: 26.4/s
Lat: 51ms
RPS: 30.4/s
Lat: 43ms
RPS: 30.2/s
Lat: 26ms
RPS: 45.2/s
Lat: 26ms
RPS: 44.4/s
Lat: 26ms
RPS: 45.1/s

Model/AIDB & Langflow support

ModelTypeAIDB supportedLangflow supported
llama-3.1-8b-instructCompletion✅chat ✅agentic
llama-3.3-nemotron-super-49b-v1Completion✅chat ✅agentic
llama-3.3-nemotron-super-49b-v1.5Completion✅chat ✅agentic
nemotron-3-nanoCompletion✅chat ✅agentic
nvidia-nemotron-nano-9b-v2Completion✅chat ✅agentic
gpt-oss-120bCompletion✅chat ✅agentic
gpt-oss-20bCompletion✅chat ✅agentic
llama-3.2-nemoretriever-300m-embed-v1Embedding (Text)✅embeddings
llama-3.2-nv-embedqa-1b-v2Embedding (Text)✅embeddings
nv-embedqa-e5-v5Embedding (Text)✅embeddings
nvclip-vit-h-14Embedding (Text & Image)
llama-3.2-nv-rerankqa-1b-v2Reranking
paddleocrOCR
nemoretriever-ocr-v1OCR

Test data & methodology

Test typeData sourceAvg input sizeInput rangeLatency requestsThroughput requestsConcurrency
CompletionAlpaca instructions + input~19 tokens (77 chars)29–234 chars30200100
Embedding (Text)Alpaca instructions + output~77 tokens (308 chars)46–993 chars30200100
Embedding (Image)Test image (PNG)10.3 KB100200100
Reranking5 hardcoded passages100200100
OCRTest image (PNG)10.3 KB100200100
  • Completion inputs: Instruction and input fields from the Stanford Alpaca dataset.
  • Embedding inputs: Instruction and output fields concatenated for longer, more realistic texts (up to ~1024 tokens).
  • Completion tests: Responses are limited to a maximum of 1024 tokens, streaming enabled for latency measurement (TTFT + full request).
  • Latency: Sequential requests measuring per-request response time.
  • Throughput: Concurrent requests measuring requests per second (RPS).

Legend

Support table: ✅ = all tests passed, ⚠️ = model runs but some tests failed (partial success), ❌ = failed to start (container error), — = not tested

Performance table:

AbbrMeaning
TTFTTime to first token — avg latency until the first streamed token arrives (ms)
LatRequest latency — avg end-to-end time for a full sequential request (ms)
RPSRequests per second — throughput under concurrent load
DimEmbedding vector dimensions
ILatImage request latency — same as Lat but for image inputs (ms)
IRPSImage requests per second — same as RPS but for image inputs

Hardware Configurations

  • 2x H100 NVL — INTEL(R) XEON(R) GOLD 6548Y+ (128 cores), 1007 GB RAM
  • 2x A100 80GB PCIe — AMD EPYC 7773X 64-Core Processor (30 cores), 147 GB RAM
  • 2x L40S — AMD EPYC 9354 32-Core Processor (128 cores), 503 GB RAM
  • 4x RTX PRO 6000 Blackwell Server Edition — AMD EPYC 9355 32-Core Processor (60 cores), 566 GB RAM

Where

  • ✅ means test passed.
  • ❌ means test failed.
    • means not tested.