Skip to content

alpozcan/pi-triage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

pi-triage v3

A multi-signal smart model router for pi. Provider-agnostic, zero-config, cache-aware.

How It Works

                        User sends a prompt
                               │
                               ▼
                  ┌─────────────────────────┐
                  │    Duplicate Provider    │◄─── same model on multiple providers?
                  │      Resolution          │     ask user which to prefer
                  └────────────┬────────────┘
                               │
                               ▼
               ┌───────────────┴───────────────┐
               │     Is this the first prompt?  │
               └───────┬─────────────┬─────────┘
                    yes│             │no
                       ▼             ▼
              ╔═══════════════╗  ┌─────────────────────────┐
              ║   PHASE 1     ║  │      PHASE 2             │
              ║  Initial Route║  │  Mid-Session Monitoring  │
              ╚════════╤══════╝  └────────────┬────────────┘
                       │                      │
                       ▼                      ▼

Phase 1: Initial Routing (first prompt only)

A 4-layer waterfall classifier. Each layer either produces a confident decision or falls through to the next.

  User prompt
       │
       ▼
 ┌─────────────────────────────────────────────────────────┐
 │  LAYER 1 — Keyword Rules               latency: <0.1ms │
 │                                                         │
 │  Exact substring match against curated keyword lists.    │
 │  "what is" → light    "implement" → medium              │
 │  "refactor the entire" → heavy                           │
 │                                                         │
 │  Match? ──yes──► TIER DECIDED                           │
 │       │                                                  │
 │      no                                                  │
 │       ▼                                                  │
 │  LAYER 2 — Heuristic Analysis          latency: <1ms    │
 │                                                         │
 │  Scans prompt for 5 signal categories:                   │
 │  ┌─────────────────┬──────────────────────────────┐      │
 │  │ Signal          │ What it detects              │      │
 │  ├─────────────────┼──────────────────────────────┤      │
 │  │ Code patterns   │ ```, function, class, import │      │
 │  │ Error patterns  │ Error:, Traceback, segfault  │      │
 │  │ Multi-step      │ "and then", "first", "1. 2."│      │
 │  │ File paths      │ .ts, src/, ~/, ./           │      │
 │  │ Architecture    │ microservice, design pattern │      │
 │  └─────────────────┴──────────────────────────────┘      │
 │                                                         │
 │  Each signal adds to 4 dimensions (0-5 scale):           │
 │    complexity · codeInvolvement · toolUsage · context    │
 │                                                         │
 │  Dimension average → tier:                               │
 │    avg ≤ 1.2 → light    avg ≤ 3.0 → medium    → heavy  │
 │                                                         │
 │  Signal count → confidence score (0.0-1.0):              │
 │    more distinct signals = higher confidence             │
 │                                                         │
 │  confidence ≥ 0.6? ──yes──► TIER DECIDED                │
 │       │                                                  │
 │      no                                                  │
 │       ▼                                                  │
 │  LAYER 3 — Session Context             latency: <0.1ms  │
 │                                                         │
 │  Boosts score based on what happened *so far*:           │
 │    many tool calls (+1.0)    many files touched (+1.0)   │
 │    previously escalated (+1.0)                           │
 │                                                         │
 │  boost ≥ 2.0 → heavy    boost ≥ 1.5 → medium           │
 │  Otherwise → default (medium)                            │
 │                                                         │
 │  AI fallback enabled? ──no──► TIER DECIDED              │
 │       │                                                  │
 │      yes                                                 │
 │       ▼                                                  │
 │  LAYER 4 — AI Fallback               latency: ~3-5s     │
 │                                                         │
 │  Sends prompt to cheapest model with a structured        │
 │  scoring prompt. Returns JSON with the same 4 dims.      │
 │  Aborts after 10s timeout → falls back to heuristic.    │
 │                                                         │
 │  avg ≤ 1.5 → light    avg ≤ 3.2 → medium    → heavy    │
 │                                                         │
 │  ► TIER DECIDED                                         │
 └─────────────────────────────────────────────────────────┘

Phase 2: Mid-Session Monitoring (subsequent prompts)

Lightweight re-evaluation on every prompt. Only suggests upgrades, never downgrades — cache invalidation isn't worth it.

  Subsequent prompt
       │
       ▼
  Run Layers 1+2 only (<1ms)
       │
       ▼
  ┌──────────────────────────────┐
  │ Wanted tier > current tier?  │
  └──────┬───────────────┬──────┘
        no│              │yes
         │               ▼
         │    ┌──────────────────────────┐
         │    │ Cooldown active?         │◄── at least 3 turns
         │    │ (< 3 turns since last)   │    since last suggestion
         │    └────┬─────────────┬───────┘
         │        yes            no
         │         │              ▼
         │         │    ┌──────────────────────┐
         │         │    │ Show confirm dialog   │
         │         │    │                       │
         │         │    │ Current model + tier  │
         │         │    │ Suggested model + tier│
         │         │    │ Cost comparison       │
         │         │    │ Cache warning         │
         │         │    │                       │
         │         │    │ Auto-dismiss: 8s      │
         │         │    └────┬──────────┬───────┘
         │         │         │yes        │no
         │         │         ▼           ▼
         │         │    Switch model   Stay, notify
         │         │    log decision   user can /triage reroute
         ▼         ▼
    Do nothing — cache stays warm

Error Escalation

Runs on every turn_end. Independent of prompt routing.

  Tool call fails (isError=true)
       │
       ▼
  consecutiveErrors++
       │
       ▼
  consecutiveErrors ≥ threshold (default: 2)?
       │
      yes ▼
  escalationCount < maxEscalations (default: 2)?
       │
      yes ▼
  Escalate: light → medium → heavy
  Log + notify user

Duplicate Provider Resolution

On session start, if the same model exists on multiple providers:

  models.json scanned
       │
       ▼
  ┌──────────────────────────────┐
  │ Same model on 2+ providers?  │
  └──────┬───────────────┬──────┘
        no│              │yes
         │               ▼
         │    Show interactive dialog:
         │    "glm-5.1 available from:
         │     1. openrouter — $0.95/$3.15 (cheapest)
         │     2. zai — $1.40/$4.40
         │     Use openrouter?"
         │         │yes/timeout     │no
         │         ▼                ▼
         │    Pick cheapest    Pick user's choice
         │         │                │
         ▼         ▼                ▼
    Feed preferred providers into auto-discovery

Auto-Discovery Scoring

Models are scored per-tier based on cost and capabilities:

  For each discovered model:
       │
       ├── tier=light/triage:  score = inputCost + outputCost
       │   (lowest cost wins)
       │
       ├── tier=medium:  score = |cost - medianCost| - bonuses
       │   (closest to median, bonus for reasoning + context window)
       │
       └── tier=heavy:  score = -cost - bonuses
           (most expensive + reasoning required, penalize non-reasoning)

Session Lifecycle

  session_start                before_agent_start         turn_end
       │                            │                       │
       ├─ load config               ├─ Phase 1 or 2         ├─ check escalation
       ├─ discover models           ├─ run 4-layer router   ├─ cost tracking
       ├─ resolve duplicates        ├─ apply model          └─ reset error counter
       ├─ auto-assign tiers         └─ log + notify
       └─ notify user

User Override

  User presses Ctrl+P or /model
       │
       ▼
  model_select event fires
       │
       ▼
  triageIsSettingModel? ──yes──► ignore (triage's own switch)
       │no
       ▼
  Set userTookControl = true
  Pause auto-routing
  Notify: "/triage on to re-enable"

Features

  • Zero-delay fast path — keyword + heuristic covers 85%+ of prompts in <5ms
  • Cache-aware — routes once, only suggests upgrades mid-session
  • Provider-agnostic — auto-discovers from ~/.pi/agent/models.json
  • Duplicate provider handling — interactive dialog when same model is on multiple providers
  • Multi-step escalation — light → medium → heavy on consecutive errors
  • Cost tracking & logging — per-tier turn counts, optional JSON log
  • /triage dry-run <prompt> — test routing without applying

Commands

Command Purpose
/triage status Full status: tiers, session, costs
/triage history Session history
/triage config Show resolved config
/triage dry-run <prompt> Test routing without applying
/triage discover Re-discover models
/triage off/on Pause/resume
/triage reset Re-route on next prompt

Config (~/.pi/agent/triage.json)

All fields optional. Auto-discovery handles models.

{
  "heuristicConfidenceThreshold": 0.6,
  "aiFallbackEnabled": true,
  "aiFallbackTimeoutMs": 10000,
  "midSessionMonitoring": {
    "enabled": true,
    "minTurnsBeforeSuggest": 2,
    "suggestTimeout": 8000,
    "suggestCooldownTurns": 3
  },
  "escalation": {
    "enabled": true,
    "errorThreshold": 2,
    "maxEscalations": 2
  },
  "logging": { "enabled": false }
}

Override specific tiers:

{
  "tiers": {
    "heavy": { "provider": "openrouter", "model": "z-ai/glm-5.1" }
  }
}

Installation

ln -s ~/path/to/pi-triage ~/.pi/agent/extensions/pi-triage

License

MIT

About

Smart model router for pi coding agent, multi-signal triage with keyword matching, heuristic analysis, AI fallback, and auto-discovery

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors