Configuring models
A model id tells yottacode which model to send each turn to. Model names are provider-specific.
Set the startup model
Environment variable:
export YOTTACODE_MODEL=<your-model-id>Flag:
yottacode --model <your-model-id> --base-url https://api.openai.com/v1 --api-key sk-...Config file:
[active]
provider = "openai"
default_model = "<your-model-id>"Switch models in the TUI
/model <your-model-id>This changes the active model for the current session and rebuilds the adapter. It does not necessarily rewrite your shell environment.
Manage default models from the CLI
yottacode model list
yottacode model list --all
yottacode model use <your-model-id>
yottacode model fetch
yottacode model fetch openaimodel use updates the configured active default_model.
Fetch live models
yottacode model fetch openaiThis calls the provider /models endpoint and prints the merged model list. It is useful when checking auth, endpoint shape, or whether a newly released model is visible to your account.
Reasoning models
Some models expose reasoning streams or summaries:
| Provider/model family | Behavior |
|---|---|
OpenAI o1*, o3*, o4*, gpt-5* | Routed to Responses API when appropriate |
openai-auth account models | Use the ChatGPT-authenticated backend and surface reasoning summaries where available |
copilot account models | Use the GitHub Copilot backend; available models depend on subscription tier (Free/Pro/Pro+) |
| xAI Grok reasoning models | Reasoning content is surfaced when present |
| Ollama thinking models | Reasoning fields are surfaced when the OpenAI shim provides them |
| Standard chat models | Stream final content only |
Use --reasoning-effort low|medium|high when the selected provider/model supports it.
Hosted tools by model/provider
Hosted provider tools depend on provider support, not just the model name.
- OpenAI:
web_searchdefault-on;code_interpreteroptional - xAI:
web_searchdefault-on;x_searchandcode_interpreteroptional - Ollama/custom OpenAI-compatible: no hosted provider tools; local yottacode tools still work
Cache-safe task routing
Routing lets yottacode run isolated, throwaway work (subagents and
history compaction) on a cheap fast model while your main
conversation stays on your chosen smart model. It is opt-in via the
[router] block in ~/.yottacode/config.toml:
[router]
mode = "auto" # off | manual | auto
fast_model = "anthropic:claude-haiku-4-5"
smart_model = "anthropic:claude-opus-4-6"fast_model / smart_model use the same "<provider>" or
"<provider>:<model>" grammar as the multi-provider router’s
candidates. Both are required when mode is not off. Provider names
refer to your [[providers]] blocks, and the model must be listed in
that provider’s models (typos are rejected at load time).
Why this saves money (and never costs more)
In an agentic loop the dominant cost is re-sending the full context (system prompt + files + history) on every turn. Prompt caching makes repeat turns on the same model cheap โ cache reads are a fraction of the input price. Switching the main-thread model mid-conversation would throw that cache away on both models and cost more, so yottacode never does it.
Routing only ever targets contexts that never shared the main thread’s cache in the first place:
- Subagents each build a fresh, isolated context window.
- Summarization / compaction is a single isolated call.
Running those on the fast model is a pure saving with zero cache churn. Your interactive turns are untouched.
Modes
mode | Behavior |
|---|---|
off (default) | Routing disabled. Everything runs on your active model. Fully backward compatible. |
manual | Resolves fast_model / smart_model, but only routes a subagent when its definition declares an explicit model: (see subagents.md). Non-annotated agents inherit your active model, exactly as with routing off. |
auto | Routes by the agent’s nature: read-only / search subagents (the Explore and Plan built-ins) and summarization โ fast_model; everything else (the general-purpose and verification built-ins, or any agent that can mutate/run) โ smart_model. |
The auto heuristic is deterministic and free โ it inspects each
agent’s declared tool allowlist, with no extra model call to classify
the task. An agent restricted to read-only tools (read_file, grep, glob,
git read subcommands, etc.) routes to fast_model; an agent that can
mutate the workspace or run commands (run_bash, write_file, โฆ)
routes to smart_model. An explicit model: on an agent definition
always wins over the heuristic. (Your main conversation is never
affected either way โ only subagents and summarization.)
Seeing what ran where
The model a subagent ran on is shown in the /subagents picker and on
each subagent’s completion card (โฆ ยท on claude-haiku-4-5), so you can
confirm at a glance that a search subagent used the fast model and a
heavier one used the smart model.
Note: yottacode does not yet aggregate per-model token totals or cost across a session โ token figures shown are per-subagent estimates.
Relationship to the multi-provider router
The same [router] block also hosts the multi-provider failover
router (enabled, candidates, policy, health knobs), which
dispatches each main-thread turn across candidates with fallback. That
is a separate, orthogonal feature: failover is about resilience across
providers; task routing (mode / fast_model / smart_model) is about
spending less on isolated work. They can be configured independently.
No silent fallback
If the model or base URL is missing, yottacode exits with a clear error. It does not silently default to localhost or a paid cloud provider.
Choosing a model
Practical starting points:
- Local/privacy-first: Ollama with Qwen, Llama, or DeepSeek models
- General coding: a strong OpenAI-compatible coding model
- Deep planning: a reasoning model, with higher latency/cost
- Scripting/CI: cheaper fast model plus low
--max-iterations
Use /doctor or yottacode doctor when a model is configured but not visible to the provider.