NVIDIA NIM

NVIDIA NIM (NVIDIA Inference Microservices) exposes 100+ hosted models — Llama, DeepSeek, Nemotron, Qwen, and more — behind a single OpenAI-compatible API at https://integrate.api.nvidia.com/v1. yottacode talks to it through the openai-compatible provider.

Configure

You need an nvapi- API key first — see Register and get an API key below.

In the TUI — add or switch providers without restarting:

/provider                        # open the picker → Add a profile: kind=openai-compatible,
                                 # base URL https://integrate.api.nvidia.com/v1, API key, model
/provider use openai-compatible  # switch to a saved profile
/model meta/llama-3.3-70b-instruct

From the command line — set environment variables:

export YOTTACODE_PROVIDER=openai-compatible
export YOTTACODE_MODEL=meta/llama-3.3-70b-instruct
export YOTTACODE_BASE_URL=https://integrate.api.nvidia.com/v1
export YOTTACODE_API_KEY=nvapi-...

…or pass flags at launch (they override the environment):

yottacode --provider openai-compatible \
  --model meta/llama-3.3-70b-instruct \
  --base-url https://integrate.api.nvidia.com/v1 \
  --api-key nvapi-...

NVIDIA’s endpoint speaks the standard OpenAI wire protocol (/v1/chat/completions and /v1/models), so no native adapter is required.

Register and get an API key

You need a free NVIDIA account and an API key (it starts with nvapi-).

Create a free NVIDIA Developer account

Open the API catalog

Go to build.nvidia.com and browse the model catalog. Each model has a playground plus a code panel showing the OpenAI-compatible request.

Generate your key

Pick any model and click Get API Key. Copy the generated key — it begins with nvapi-. The same key works for every model in the catalog.

Configure yottacode

Set the key and base URL, then choose a model — see Configure above. To save a reusable profile, run the setup wizard and add an openai-compatible entry:

yottacode setup

Tip

New accounts get a pool of free inference credits (1,000 at the time of writing) and a modest request rate limit. These terms are set by NVIDIA and can change — check build.nvidia.com for current limits.

Finding model IDs

The model ID is the namespaced slug shown on each model’s page (the model field in the code sample), for example:

meta/llama-3.3-70b-instruct
meta/llama-3.1-8b-instruct
deepseek-ai/deepseek-r1
nvidia/llama-3.1-nemotron-70b-instruct

You can also list what your key can reach from inside yottacode:

/models

or from the shell:

yottacode doctor

Notes

No billing dashboard. Like other openai-compatible endpoints pointed at integrate.api.nvidia.com, NVIDIA NIM reports token counts only in /usage — there is no per-model dollar figure. See Usage and cost.
Hosted tools. Provider-native hosted tools (web search, code interpreter) are not available over the OpenAI-compatible endpoint; the model can still use yottacode’s local tools.
Self-hosted NIM. If you run NIM containers yourself, point YOTTACODE_BASE_URL at your own /v1 endpoint instead — the configuration is otherwise identical.

OpenAI-compatible