A multi-signal smart model router for pi. Provider-agnostic, zero-config, cache-aware.
User sends a prompt
│
▼
┌─────────────────────────┐
│ Duplicate Provider │◄─── same model on multiple providers?
│ Resolution │ ask user which to prefer
└────────────┬────────────┘
│
▼
┌───────────────┴───────────────┐
│ Is this the first prompt? │
└───────┬─────────────┬─────────┘
yes│ │no
▼ ▼
╔═══════════════╗ ┌─────────────────────────┐
║ PHASE 1 ║ │ PHASE 2 │
║ Initial Route║ │ Mid-Session Monitoring │
╚════════╤══════╝ └────────────┬────────────┘
│ │
▼ ▼
A 4-layer waterfall classifier. Each layer either produces a confident decision or falls through to the next.
User prompt
│
▼
┌─────────────────────────────────────────────────────────┐
│ LAYER 1 — Keyword Rules latency: <0.1ms │
│ │
│ Exact substring match against curated keyword lists. │
│ "what is" → light "implement" → medium │
│ "refactor the entire" → heavy │
│ │
│ Match? ──yes──► TIER DECIDED │
│ │ │
│ no │
│ ▼ │
│ LAYER 2 — Heuristic Analysis latency: <1ms │
│ │
│ Scans prompt for 5 signal categories: │
│ ┌─────────────────┬──────────────────────────────┐ │
│ │ Signal │ What it detects │ │
│ ├─────────────────┼──────────────────────────────┤ │
│ │ Code patterns │ ```, function, class, import │ │
│ │ Error patterns │ Error:, Traceback, segfault │ │
│ │ Multi-step │ "and then", "first", "1. 2."│ │
│ │ File paths │ .ts, src/, ~/, ./ │ │
│ │ Architecture │ microservice, design pattern │ │
│ └─────────────────┴──────────────────────────────┘ │
│ │
│ Each signal adds to 4 dimensions (0-5 scale): │
│ complexity · codeInvolvement · toolUsage · context │
│ │
│ Dimension average → tier: │
│ avg ≤ 1.2 → light avg ≤ 3.0 → medium → heavy │
│ │
│ Signal count → confidence score (0.0-1.0): │
│ more distinct signals = higher confidence │
│ │
│ confidence ≥ 0.6? ──yes──► TIER DECIDED │
│ │ │
│ no │
│ ▼ │
│ LAYER 3 — Session Context latency: <0.1ms │
│ │
│ Boosts score based on what happened *so far*: │
│ many tool calls (+1.0) many files touched (+1.0) │
│ previously escalated (+1.0) │
│ │
│ boost ≥ 2.0 → heavy boost ≥ 1.5 → medium │
│ Otherwise → default (medium) │
│ │
│ AI fallback enabled? ──no──► TIER DECIDED │
│ │ │
│ yes │
│ ▼ │
│ LAYER 4 — AI Fallback latency: ~3-5s │
│ │
│ Sends prompt to cheapest model with a structured │
│ scoring prompt. Returns JSON with the same 4 dims. │
│ Aborts after 10s timeout → falls back to heuristic. │
│ │
│ avg ≤ 1.5 → light avg ≤ 3.2 → medium → heavy │
│ │
│ ► TIER DECIDED │
└─────────────────────────────────────────────────────────┘
Lightweight re-evaluation on every prompt. Only suggests upgrades, never downgrades — cache invalidation isn't worth it.
Subsequent prompt
│
▼
Run Layers 1+2 only (<1ms)
│
▼
┌──────────────────────────────┐
│ Wanted tier > current tier? │
└──────┬───────────────┬──────┘
no│ │yes
│ ▼
│ ┌──────────────────────────┐
│ │ Cooldown active? │◄── at least 3 turns
│ │ (< 3 turns since last) │ since last suggestion
│ └────┬─────────────┬───────┘
│ yes no
│ │ ▼
│ │ ┌──────────────────────┐
│ │ │ Show confirm dialog │
│ │ │ │
│ │ │ Current model + tier │
│ │ │ Suggested model + tier│
│ │ │ Cost comparison │
│ │ │ Cache warning │
│ │ │ │
│ │ │ Auto-dismiss: 8s │
│ │ └────┬──────────┬───────┘
│ │ │yes │no
│ │ ▼ ▼
│ │ Switch model Stay, notify
│ │ log decision user can /triage reroute
▼ ▼
Do nothing — cache stays warm
Runs on every turn_end. Independent of prompt routing.
Tool call fails (isError=true)
│
▼
consecutiveErrors++
│
▼
consecutiveErrors ≥ threshold (default: 2)?
│
yes ▼
escalationCount < maxEscalations (default: 2)?
│
yes ▼
Escalate: light → medium → heavy
Log + notify user
On session start, if the same model exists on multiple providers:
models.json scanned
│
▼
┌──────────────────────────────┐
│ Same model on 2+ providers? │
└──────┬───────────────┬──────┘
no│ │yes
│ ▼
│ Show interactive dialog:
│ "glm-5.1 available from:
│ 1. openrouter — $0.95/$3.15 (cheapest)
│ 2. zai — $1.40/$4.40
│ Use openrouter?"
│ │yes/timeout │no
│ ▼ ▼
│ Pick cheapest Pick user's choice
│ │ │
▼ ▼ ▼
Feed preferred providers into auto-discovery
Models are scored per-tier based on cost and capabilities:
For each discovered model:
│
├── tier=light/triage: score = inputCost + outputCost
│ (lowest cost wins)
│
├── tier=medium: score = |cost - medianCost| - bonuses
│ (closest to median, bonus for reasoning + context window)
│
└── tier=heavy: score = -cost - bonuses
(most expensive + reasoning required, penalize non-reasoning)
session_start before_agent_start turn_end
│ │ │
├─ load config ├─ Phase 1 or 2 ├─ check escalation
├─ discover models ├─ run 4-layer router ├─ cost tracking
├─ resolve duplicates ├─ apply model └─ reset error counter
├─ auto-assign tiers └─ log + notify
└─ notify user
User presses Ctrl+P or /model
│
▼
model_select event fires
│
▼
triageIsSettingModel? ──yes──► ignore (triage's own switch)
│no
▼
Set userTookControl = true
Pause auto-routing
Notify: "/triage on to re-enable"
- Zero-delay fast path — keyword + heuristic covers 85%+ of prompts in <5ms
- Cache-aware — routes once, only suggests upgrades mid-session
- Provider-agnostic — auto-discovers from
~/.pi/agent/models.json - Duplicate provider handling — interactive dialog when same model is on multiple providers
- Multi-step escalation — light → medium → heavy on consecutive errors
- Cost tracking & logging — per-tier turn counts, optional JSON log
/triage dry-run <prompt>— test routing without applying
| Command | Purpose |
|---|---|
/triage status |
Full status: tiers, session, costs |
/triage history |
Session history |
/triage config |
Show resolved config |
/triage dry-run <prompt> |
Test routing without applying |
/triage discover |
Re-discover models |
/triage off/on |
Pause/resume |
/triage reset |
Re-route on next prompt |
All fields optional. Auto-discovery handles models.
{
"heuristicConfidenceThreshold": 0.6,
"aiFallbackEnabled": true,
"aiFallbackTimeoutMs": 10000,
"midSessionMonitoring": {
"enabled": true,
"minTurnsBeforeSuggest": 2,
"suggestTimeout": 8000,
"suggestCooldownTurns": 3
},
"escalation": {
"enabled": true,
"errorThreshold": 2,
"maxEscalations": 2
},
"logging": { "enabled": false }
}Override specific tiers:
{
"tiers": {
"heavy": { "provider": "openrouter", "model": "z-ai/glm-5.1" }
}
}ln -s ~/path/to/pi-triage ~/.pi/agent/extensions/pi-triageMIT