Skip to content
YottaCode v0.2.0 is out! πŸŽ‰ See the release notes β†—

NVIDIA NIM

NVIDIA NIM (NVIDIA Inference Microservices) exposes 100+ hosted models β€” Llama, DeepSeek, Nemotron, Qwen, and more β€” behind a single OpenAI-compatible API at https://integrate.api.nvidia.com/v1. yottacode talks to it through the openai-compatible provider.

Configure

You need an nvapi- API key first β€” see Register and get an API key below.

In the TUI β€” add or switch providers without restarting:

/provider                        # open the picker β†’ Add a profile: kind=openai-compatible,
                                 # base URL https://integrate.api.nvidia.com/v1, API key, model
/provider use openai-compatible  # switch to a saved profile
/model meta/llama-3.3-70b-instruct

From the command line β€” set environment variables:

export YOTTACODE_PROVIDER=openai-compatible
export YOTTACODE_MODEL=meta/llama-3.3-70b-instruct
export YOTTACODE_BASE_URL=https://integrate.api.nvidia.com/v1
export YOTTACODE_API_KEY=nvapi-...

…or pass flags at launch (they override the environment):

yottacode --provider openai-compatible \
  --model meta/llama-3.3-70b-instruct \
  --base-url https://integrate.api.nvidia.com/v1 \
  --api-key nvapi-...

NVIDIA’s endpoint speaks the standard OpenAI wire protocol (/v1/chat/completions and /v1/models), so no native adapter is required.

Register and get an API key

You need a free NVIDIA account and an API key (it starts with nvapi-).

Create a free NVIDIA Developer account

Sign up at developer.nvidia.com. The NVIDIA Developer Program is free to join.

Open the API catalog

Go to build.nvidia.com and browse the model catalog. Each model has a playground plus a code panel showing the OpenAI-compatible request.

Generate your key

Pick any model and click Get API Key. Copy the generated key β€” it begins with nvapi-. The same key works for every model in the catalog.

Configure yottacode

Set the key and base URL, then choose a model β€” see Configure above. To save a reusable profile, run the setup wizard and add an openai-compatible entry:

yottacode setup

Tip

New accounts get a pool of free inference credits (1,000 at the time of writing) and a modest request rate limit. These terms are set by NVIDIA and can change β€” check build.nvidia.com for current limits.

Finding model IDs

The model ID is the namespaced slug shown on each model’s page (the model field in the code sample), for example:

  • meta/llama-3.3-70b-instruct
  • meta/llama-3.1-8b-instruct
  • deepseek-ai/deepseek-r1
  • nvidia/llama-3.1-nemotron-70b-instruct

You can also list what your key can reach from inside yottacode:

/models

or from the shell:

yottacode doctor

Notes

  • No billing dashboard. Like other openai-compatible endpoints pointed at integrate.api.nvidia.com, NVIDIA NIM reports token counts only in /usage β€” there is no per-model dollar figure. See Usage and cost.
  • Hosted tools. Provider-native hosted tools (web search, code interpreter) are not available over the OpenAI-compatible endpoint; the model can still use yottacode’s local tools.
  • Self-hosted NIM. If you run NIM containers yourself, point YOTTACODE_BASE_URL at your own /v1 endpoint instead β€” the configuration is otherwise identical.