EDB Model Server v1.4.0 (LTS)

The EDB Model Server component connects a flow to an inference model running on a Hybrid Manager (HM) model server cluster. Use it for chat-style language models, reasoning models, and other inference endpoints that you've deployed to HM.

Choosing the right component

Use the EDB Model Server component when:

  • You want a language-model response in your flow from a model running on an HM-hosted model server cluster.
  • You want a LanguageModel object to plug into an agent or downstream chain.
  • You're using an OpenAI-compatible endpoint (the default) or a native NVIDIA NIM endpoint.

Use a different component if:

  • You want embeddings rather than text responses. Use EDB Embeddings.
  • You want CPU-based inference running on the database cluster rather than on a model server. Use EDB Embedded Models.

Prerequisites

  • An HM model server cluster with the target model deployed. See Models for how to create one. A GPU node is typically required.

  • An HM machine-user access key saved in Langflow as a Global Variable (default name HM_API_KEY).

Inputs

Prompt

FieldTypeRequiredDefaultNotes
InputMessageNoThe user message to send to the model. Connect from upstream Chat Input or other text-producing components.
System MessageMultiline textNoSystem prompt passed to the model.
StreamBooleanNofalseAdvanced. Stream the model's response tokens as they arrive. Only supported by models that expose a streaming endpoint.

Connection

FieldTypeRequiredDefaultNotes
Hybrid Manager URLTextNoAdvanced. Override the default HM URL.
HM Machine User KeySecretYesHM_API_KEYDefaults to the global variable named HM_API_KEY.
Hybrid Manager Model Server Cluster InstanceDropdownYesThe model server cluster to call. Populated from your HM model clusters.
External IngressBooleanNoAdvanced. Route through the external ingress instead of the in-cluster service.

Model

FieldTypeRequiredDefaultNotes
API ClientDropdownNoOpenAIAdvanced. OpenAI is required for correct tool-calling behavior in Agent flows. NVIDIA only when connecting to a native NIM endpoint that needs NVIDIA-specific features like detailed thinking.
Model NameDropdownYesPopulated from the selected model server cluster. Use the refresh button if the list is empty.
TemperatureSliderNo0.1Controls randomness. Range 0–1, step 0.01.
Max TokensIntegerNoAdvanced. Maximum tokens to generate. Range 0–128000. Set to 0 for unlimited.
SeedIntegerNo1Advanced. Controls reproducibility.
Default model queryTextNoAdvanced. Sent when no input is connected to the component. Useful for testing a model that does not accept empty input.

API Client = OpenAI (default)

FieldTypeRequiredDefaultNotes
Model KwargsDictNoAdvanced. Additional keyword arguments passed to the OpenAI chat client.
JSON ModeBooleanNofalseAdvanced. Force the model to return JSON.
Max RetriesIntegerNo5Advanced. Maximum retries on a failed request.
TimeoutIntegerNo700Advanced. Per-request timeout in seconds.

API Client = NVIDIA

FieldTypeRequiredDefaultNotes
Detailed ThinkingBooleanNofalseAdvanced. Return the model's detailed thought process. Only supported by NVIDIA reasoning models.

The OpenAI-specific fields hide when API Client is NVIDIA, and vice versa. Field values are preserved when you switch between clients.

Outputs

OutputTypeCarries
Model ResponseMessageThe model's text response to the input prompt.
Language ModelLanguageModelA LangChain LanguageModel instance pointing at the selected HM model server cluster. Pass this into an agent or chain that needs a model handle.