How to deploy AI Models from the Model Library Innovation Release

Prerequisite: Access to the Hybrid Manager UI with AI Factory enabled. See AI Factory in Hybrid Manager.

This guide explains how to deploy AI models from the AI Factory Model Library into Model Serving (powered by KServe) in your Hybrid Manager (HM) environment.

Once deployed, these models power key AI Factory features:

  • Knowledge Bases (via AIDB pipelines)
  • Gen AI Builder Assistants and pipelines
  • Other AI Factory and application integrations

Who should use this guide?

  • AI platform admins deploying validated model images
  • Data engineers configuring AI models for Knowledge Bases
  • AI application developers configuring models for Assistants

What this enables

Once deployed:

  • Your AI models are available in Model Serving.
  • You can link them to Knowledge Bases or Gen AI Builder pipelines.
  • You can monitor and manage deployed models via the HM Model Serving UI or Kubernetes.

Estimated time to complete

10–20 minutes per model, depending on model size and cluster resources.

Prerequisites

Before you begin:

  • An active HM environment with GPU worker nodes configured.
  • Prepare the credentials for model provider NIM and HuggingFace.

If you will use models from NVIDIA NIM from the public internet, you need to prepare two credentials.

The nvidia-nim-secret will be used to download NIM profiles.

apiVersion: v1
data:
  NGC_API_KEY: <base64 encoded NGC API Key>
kind: Secret
metadata:
  annotations:
    replicator.v1.mittwald.de/replicate-to: m-.*
  labels:
  name: nvidia-nim-secrets
  namespace: default
type: Opaque

The ngc-cred will be used to pull images from NVIDIA NIM.

$ kubectl -n default create secret docker-registry ngc-cred \
    --docker-server=nvcr.io \
    --docker-username='$oauthtoken' \
    --docker-password=${NGC_API_KEY}

$ kubectl -n default annotate secret ngc-cred \
    replicator.v1.mittwald.de/replicate-to='m-.*'

If you already store profiles in the object storage and images in your private registry, you don't need these two secrets. See: How-To Use NVIDIA NIM Model Cache in Air‑Gapped Clusters in Hybrid Manager

If you will use a private HuggingFace model, create the following secret.

apiVersion: v1
kind: Secret
metadata:
  name: hf-secret
  namespace: default
  annotations:
    replicator.v1.mittwald.de/replicate-to: m-.*
type: Opaque
data:
  HF_TOKEN: <base64 encoded HF API Key>

Steps to deploy an AI model

1. Creating a model in Asset Library

  • Go to Asset Library > Models.
  • Select Add New Model
  • Configure parameters:
    • Model Name, must consist of lower-case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character.
    • Description
    • Tags
    • Functions, must include at least one that starts with "aidb-".
    • AI Model Provider
      • If you select NIM provider,
        • Image URL. for example, nvcr.io/nim/openai/gpt-oss-20b:latest.
      • If you select HuggingFace, one of the following fields must be filled.
        • Hugging Face Model Name, for example, openai/gpt-oss-20b. The model must be on Hugging Face.
        • Object Storage Path, for example, /models/openai/gpt-oss-20b. The model must already be copied to the path.
    • Default CPU/Memory/GPU
    • API Protocol Version. Select the model's API protocol. If not sure, keep it empty.
    • Max Token Length, the expected context window size of the model. Keep it empty to use the model's default value.
    • README
  • Select Add AI Model.

2. Browsing and selecting model in Asset Library

  • Go to Asset Library > Models.
  • Browse models.
  • Select the model you want to deploy.

3. Configuring and deploying the model

  • Select "Create Local Inference Service".
  • Configure deployment parameters:
    • Local Inference Service Name
    • Tags
    • Model Serving Name, the name that you want to use in API calls. Keep it empty to use the default value.
    • Model Profiles Path on Object Storage, the NIM profile path in the object storage. Ignore it when the model is not an NIM provider. See How-To Use NVIDIA NIM Model Cache in Air‑Gapped Clusters in Hybrid Manager.
    • Inference Service Instances
    • Resource requests/limits (GPU, CPU, and memory)
    • Max Token Length, to replace the value defined in the model. Keep it empty to use the value assigned in the model.
  • Select "Create Local Inference Service".

4. Verifying the deployed model

You can verify your deployed models using:

Model Serving UI in HM

  • Go to Estate > Inference Services.
  • Confirm inference service appears with status Active/Healthy.

5. Connecting the model to AI Factory workloads

Once the model is Ready:

  • You can select it in:

    • Knowledge Base pipelines (for embedding or reranking)
    • Gen AI Builder pipelines
    • Assistant configurations

The UI will show models available for each use case based on their type (Embedding, Completion, Reranking, etc.).

Supported model types

Model typeExample model
Text Completionllama-3.3-nemotron-super-49b
Text Embeddingarctic-embed-l
Image Embeddingnvclip
OCRpaddleocr
Text Rerankerllama-3.2-nv-rerankqa-1b-v2

Tips & best practices

  • GPU placement: Ensure your model matches your GPU capacity. Large models like llama-3.3-49b require multiple GPUs on a single node.
  • Quota management: Limit number of large models deployed simultaneously to avoid overloading GPU nodes.
  • Version testing: Test new model versions in isolated deployments before promoting to production pipelines or Assistants.

Troubleshooting

Model stuck in Pending

  • Check GPU node taints/labels.
  • Verify InferenceService tolerations and nodeSelectors match.

Model not appearing in Model Library

  • Confirm image is correctly tagged and synced via Image and Model Library.
  • Verify repository rules if using private registry.

Kubernetes errors on deploy

  • Check kubectl describe InferenceService <model> for detailed error logs.

Summary

  • You can deploy AI models from the AI Factory Model Library.
  • Deployed models run via KServe Model Serving.
  • Deployed models power Knowledge Bases and Gen AI Builder Assistants.
  • The deployment flow ensures consistent governance and visibility.