AI Model Configuration
Trinity uses a tiered model system to balance quality and cost across different operations. You can configure which models are used at each tier.
Model Tiers
Trinity has four tiers, each suited to different types of work:
Reasoning Tier
The most capable tier, used for tasks requiring deep thinking:
- Complex story implementation (difficulty 4-5)
- Full PRD generation (architect phase)
- Codebase audits
- Architecture analysis
Minimum intelligence level: Opus-class models
Standard Tier
The everyday workhorse, used for routine tasks:
- Regular story implementation (difficulty 1-3)
- Analyst and implementer phases for simpler stories
- Story auditing
Minimum intelligence level: Sonnet-class models
Fast Tier
Quick, bounded judgment tasks:
- Onboarding Q&A
- Documentation generation
- Roadmap section generation
- PRD editing
- Calibrator and dependency-mapper pipeline phases
Minimum intelligence level: 2 (Haiku-class or higher)
Micro Tier
Mechanical, low-intelligence tasks:
- Classification and scoring
- Preflight checklists
- Recap search triage
- Status checks
Minimum intelligence level: 1 (nano / tiny local models allowed)
Providers
Trinity supports six AI providers:
Anthropic
The primary provider:
- Claude Opus 4.7 — highest capability, default for the reasoning tier
- Claude Opus 4.6 — reasoning-class, available for the reasoning tier
- Claude Sonnet 4.6 — balanced performance, used for standard and fast tiers
- Claude Haiku 4.5 — fast and efficient, used for micro tier
DeepSeek
Alternative provider, V4 generation (released April 2026):
- DeepSeek V4 Pro — frontier reasoning, GPT-5 / Gemini-3.0-Pro tier — sits at the reasoning tier as an Opus alternative
- DeepSeek V4 Flash — general workhorse with 1M context and thinking + tool-use in both modes — sits at the standard / fast tier as a Sonnet alternative
Moonshot (Kimi)
- Kimi K2.6 — latest flagship, 262k context
- Kimi K2.5 — reasoning-capable model
- Kimi K2 Thinking — reasoning-capable model
- Kimi K2 Turbo — faster, lower-intelligence variant
Z.ai (GLM)
Zhipu GLM models via Z.ai's Claude Code integration:
- GLM 4.7 — flagship agentic coder
- GLM 4.6 — reasoning-class
- GLM 4.5 Air — faster, lower-intelligence variant
Qwen (Alibaba Cloud)
Qwen3 family via Alibaba DashScope's Claude Code integration:
- Qwen3 Coder Plus — flagship coder (context cache, tiered billing)
- Qwen3.5 Plus — current-gen general flagship (1M context, multimodal)
- Qwen3 Max — Max-series, maximum quality for complex reasoning
- Qwen3 Coder Next — balanced coder (262k context, agentic tool calling)
- Qwen3 Coder Flash — fast coder
- Qwen3.5 Flash — current-gen fast general (1M context, multimodal)
Ollama
Local model support for offline/private execution:
- Qwen3 Coder Next — code-focused model
- Qwen3 Coder — code-focused model
- GLM 4.7 — general-purpose
- DeepSeek Coder — code-focused
- Qwen 3.5 9B — small general-purpose model
Configuring Models
- Navigate to Settings
- Find the AI Models section
- For each tier, pick a harness → provider → model cascade:
- Harness — the CLI/runtime that actually executes the agent. Today the only choice is Claude Code CLI, which can talk to every provider below via Anthropic-compat endpoints.
- Provider — which vendor runs the model.
- Model — the specific model within that provider.
Switching providers restores the last model you picked for that (harness, provider) pair, so you can A/B between two providers without losing your selection. Settings are stored as harness:provider:model strings (e.g., claude-code:anthropic:claude-opus-4-7).
Defaults
| Tier | Default Model |
|---|---|
| Reasoning | Claude Opus 4.7 |
| Standard | Claude Sonnet 4.6 |
| Fast | Claude Sonnet 4.6 |
| Micro | Claude Haiku 4.5 |
Retired models appear greyed out in the picker and aren't selectable. Stored settings that reference them keep working (and appear correctly in historical metrics) until you change them.
Dynamic Tier Resolution
Some operations dynamically choose between tiers based on context:
- Story execution — stories with difficulty 4+ or large surface area automatically use the reasoning tier instead of standard
- Planning pipeline — the architect phase uses reasoning, while calibrator and dependency-mapper use fast
This means a low-difficulty story costs less than a high-difficulty one, even though both go through the same pipeline.
Tier Default Resolution
Trinity picks which model to use in this priority order (highest wins):
- Explicit — if the caller passes a specific model, that wins
- Entity / job override — a model pinned on a specific story or release, or one passed when starting a run
- User per-project — your Project Settings → Mine model choice
- Project default — the project's configured tier model
- User per-scope — your Team Settings → Mine model choice
- Team default — the team's configured tier model
- Global default — your personal Settings → AI Models choice
TIER_FALLBACK_MODEL_ID— built-in Anthropic defaults used when no layer above has been configured
This is resolution-time only — it's not a runtime "retry with a fallback on failure." If a model call fails, the operation surfaces the error (the caller handles retries, usually by going through the feedback pipeline or job-level retry).
Effort Levels
Recent Anthropic models accept an effort parameter that controls how hard the model thinks before answering. Each model supports a different subset of the ladder:
- Opus 4.7 — full 5-level:
low | medium | high | xhigh | max - Opus 4.6, Sonnet 4.6 — 4-level:
low | medium | high | max(noxhigh) - Haiku 4.5 — no effort parameter; the harness skips injecting it
Trinity's harness clamps every requested effort to what the chosen model supports, so a xhigh request against Sonnet 4.6 lands at high. Effort requests get recorded in the ai_events table for metrics.
Timeout Tiers
Each model tier has associated timeout limits:
| Tier | Timeout |
|---|---|
| Micro | 5 minutes |
| Fast (Short) | 15 minutes |
| Standard (Default) | 30 minutes |
| Reasoning (Long) | 1 hour |
Operations that exceed their timeout are cancelled and marked as failed.
Cost Considerations
Model costs vary significantly:
- Reasoning tier is the most expensive — use it only where quality matters (Trinity does this automatically for hard stories)
- Standard tier offers the best quality-to-cost ratio for most work
- Fast tier is cost-effective for bounded tasks that don't need deep reasoning
- Micro tier is very cheap, used for mechanical classification
The Metrics dashboard tracks token usage and cost by operation, helping you understand where your budget goes.
Tips
- Start with defaults — the default configuration is well-balanced for most projects
- Use DeepSeek for cost savings — if you have a DeepSeek API key, using it for the fast or standard tier can reduce costs significantly
- Use Ollama for privacy — local models keep all data on your machine, but expect slower execution and potentially lower quality
- Monitor the cost tab — check Metrics → Cost to understand your spending patterns before making changes
- Don't downgrade reasoning — the reasoning tier handles your most complex stories; using a less capable model here leads to more failures and retries, which can cost more in the end