OpenAI-compatible

Point yottacode at any endpoint that speaks the OpenAI wire protocol. Use provider kind openai-compatible or pass the base URL directly.

Configure

In the TUI — add or switch providers without restarting:

/provider                        # open the picker → Add a profile: kind=openai-compatible, base URL, API key, model
/provider use openai-compatible  # switch to a saved profile
/model <model-id>                # switch model for this session

From the command line — set environment variables:

export YOTTACODE_PROVIDER=openai-compatible
export YOTTACODE_MODEL=llama-3.1-70b
export YOTTACODE_BASE_URL=https://example.com/v1
export YOTTACODE_API_KEY=...

…or pass flags at launch (they override the environment):

yottacode --provider openai-compatible \
  --model llama-3.1-70b \
  --base-url https://example.com/v1 \
  --api-key ...

This works with many gateways and self-hosted runtimes that expose /v1/chat/completions and /v1/models.

Tested examples include NVIDIA NIM, Groq, vLLM, and Llama Stack. Other gateways that speak the same wire protocol should work but are not formally validated.

Note

Using NVIDIA’s hosted catalog at build.nvidia.com? See the dedicated NVIDIA NIM page for registration and setup.

Tool-argument tolerance. Some open models on these endpoints emit numeric and boolean tool arguments as JSON strings — {"max_results":"5"} instead of {"max_results":5}. This is a model trait, not a host one: Meta Llama 3.1/3.3 instruct do it (on NIM, Ollama, vLLM, etc.), while NVIDIA’s own Nemotron, Mistral, Qwen, and DeepSeek emit properly-typed JSON. yottacode normalizes these against each tool’s schema before the tool runs, so affected models work without configuration. A model that instead emits the whole tool call as plain text (rather than a structured call) is a separate limitation that normalization cannot fix.

Ollama NVIDIA NIM