Skip to content

Latest commit

 

History

History
1595 lines (1165 loc) · 107 KB

File metadata and controls

1595 lines (1165 loc) · 107 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

0.8.7 - 2026-03-10

Added

Cursor CLI Runtime Adapter

  • src/runtimes/cursor.ts — new runtime adapter for Cursor CLI (agent binary), implementing the AgentRuntime interface with TUI spawning via tmux, .cursor/rules/overstory.md instruction delivery, --yolo permission bypass, and headless one-shot mode — thanks to @XavierChevalier (#104, #66)
  • src/runtimes/cursor.test.ts — comprehensive test suite (497 lines) covering spawn command building, overlay generation, readiness detection, and transcript parsing

Runtime Stability Classification

  • stability field on AgentRuntime — new "stable" | "beta" | "experimental" field on the runtime interface; Claude and Sapling marked stable, Pi and Codex as beta, Copilot/Gemini/OpenCode/Cursor as experimental
  • Stability surfaced in ov agents and runtime documentation

Per-Coordinator Run Isolation

  • Per-coordinator session trackingSessionStore now tracks coordinator_name per session with auto-migration for existing databases, enabling isolated run tracking when multiple coordinators operate in the same project
  • OVERSTORY_TASK_ID env var — slung agents now receive their task ID as an environment variable; tracker close commands are guarded to prevent agents from closing issues outside their assigned scope

Dashboard Runtime Column

  • Runtime column in dashboard agent panel — the live TUI dashboard now shows which runtime each agent is using (e.g., claude, cursor, sapling) — thanks to @mustafamagdy (#99)

Fixed

  • Dashboard crash on SQLite lock contentionov dashboard no longer crashes when concurrent agents cause SQLITE_BUSY; database reads are wrapped with retry logic
  • Silent content loss in merge auto-resolve — merge resolver Tier 2 (hunk-level) no longer silently drops non-conflicting content when resolving conflicts; the entire file is now preserved correctly
  • ov init ENOENT on spawner callsspawner() calls for ecosystem tool detection are now wrapped in try/catch to prevent crashes when mulch/sd/cn CLIs are not installed
  • Shift+tab false positive in detectReady — the hasStatusBar check no longer matches shift+tab escape sequences as a status bar indicator, preventing premature ready detection
  • Claude bypass dialog and Codex shared state — Claude runtime's detectReady() now recognizes the "bypass" dialog phase; Codex runtime correctly handles sharedWritableDirs spawn option — thanks to @Ilanbux (#101)
  • Tmux pane retry for WSL2 race conditioncapturePaneContent() and sendKeys() now retry on transient tmux failures caused by WSL2 timing issues — thanks to @arosstale (#78)
  • Fish shell tmux spawn — tmux session commands are now wrapped in /bin/bash -c to prevent failures when the user's default shell is fish
  • coordinator_name column migrationcreateSessionStore() now auto-migrates existing sessions tables to add the coordinator_name column without data loss

Testing

  • 3364 tests across 100 files (7924 expect() calls)
  • New: src/runtimes/cursor.test.ts, src/commands/ecosystem.test.ts

0.8.6 - 2026-03-06

Added

Coordinator Completion Protocol

  • ov coordinator check-complete — new subcommand that evaluates configured exit triggers (allAgentsDone, taskTrackerEmpty, onShutdownSignal) and returns per-trigger status; complete = true only when ALL enabled triggers are met
  • coordinator.exitTriggers config — new coordinator section in config.yaml with three boolean triggers controlling automatic coordinator shutdown (all default to false)
  • Exit-trigger evaluation integrated into coordinator completion protocol — the coordinator can now self-terminate when configured conditions are met
  • allAgentsDone trigger also checks the merge queue to prevent premature shutdown while branches are still pending merge

Spawn Rollback

  • rollbackWorktree() — new helper in src/worktree/manager.ts that removes a worktree and deletes its branch (best-effort, errors swallowed)
  • ov sling rollback on spawn failure — if agent spawn fails after worktree creation, the worktree and branch are automatically rolled back to avoid orphaned resources

Per-Agent Cleanup

  • ov clean --agent <name> — targeted cleanup of a single agent: kills tmux session or process tree, removes worktree, deletes branch, clears agent and log directories, logs synthetic session-end event, and marks session as completed
  • ov stop --clean-worktree on completed agents — previously threw an error for completed agents; now skips the kill step and proceeds directly to worktree+branch cleanup

Merge Reliability

  • Auto-commit os-eco state files before merge — runtime state files (.seeds/, .overstory/, .mulch/, .canopy/, .greenhouse/, .claude/, CLAUDE.md) are automatically committed with chore: sync os-eco runtime state to prevent dirty-tree merge errors
  • Stash/pop dirty files during merge — uncommitted changes are stashed before merge and popped afterward, with proper cleanup on failure
  • onMergeSuccess callbackcreateMergeResolver() now accepts an optional onMergeSuccess hook called after successful merge of each entry
  • Untracked file handling in merge resolver improved to prevent conflicts between tracked and untracked files

Init Scaffold Commit

  • Auto-commit scaffold files at end of ov init — ecosystem directories (.overstory/, .seeds/, .mulch/, .canopy/, .gitattributes, CLAUDE.md) are committed so agent branches don't cause untracked-vs-tracked conflicts during merge

Fixed

  • Headless agent kill blast radiuskillSession("") with tmux prefix matching could kill ALL tmux sessions; watchdog now uses killAgent() helper that routes headless agents through PID-based killProcessTree() and TUI agents through named tmux sessions
  • Stale headless agent detection — watchdog now checks isProcessAlive(pid) for headless agents instead of only checking tmux session liveness
  • Coordinator state file commit — completion protocols now commit os-eco state files before final steps to prevent dirty-tree errors downstream
  • Coordinator premature issue closure — coordinator no longer closes seeds issues before the lead agent merges its branch; allAgentsDone trigger checks merge queue for pending branches
  • Coordinator auto-complete on session-endov run complete is no longer called automatically from the per-turn Stop hook, preventing premature run completion
  • Self-exiting coordinator — session-end hook now handles coordinators that exit themselves (e.g., via exit triggers) without throwing errors
  • --json flag stolen by parent Commander.enablePositionalOptions() added to the root program so subcommand --json flags are not consumed by the parent parser
  • Pi runtime transcript parsing — Pi v3 JSONL format stores token usage inside message events at message.usage.{input, output, cacheRead}, not in message_end events; parser now handles both formats with cacheRead counted toward input tokens (#82)
  • Pi getTranscriptDir() — now returns ~/.pi/agent/sessions/{encoded-project-path}/ instead of null, enabling ov costs for Pi agents (#82)

Changed

  • CLI command count: 34 → 35 (new check-complete subcommand under ov coordinator)

Testing

  • 3248 tests across 98 files (7677 expect() calls)

0.8.5 - 2026-03-05

Added

OpenCode Runtime Adapter

  • src/runtimes/opencode.ts — new runtime adapter for SST OpenCode (opencode CLI), implementing the AgentRuntime interface with model flag support, AGENTS.md instruction file, and headless subprocess spawning
  • src/runtimes/opencode.test.ts — test suite (325 lines) covering spawn command building, overlay generation, guard rules, and environment setup

NDJSON Event Tailer for Headless Agents

  • src/events/tailer.ts — background NDJSON event tailer that polls stdout.log files from headless agents (e.g. Sapling, OpenCode), parses new lines, and writes them into events.db via EventStore — enabling ov status, ov dashboard, and ov feed to show live progress for headless agents
  • src/events/tailer.test.ts — test suite (461 lines) covering line parsing, file tailing, stop/cleanup, and edge cases
  • Watchdog integrationrunDaemonTick() now automatically starts/stops event tailers for active headless agents, with module-level tailer registry persisting across ticks

Headless Agent Inspection

  • ov inspect stdout.log fallback — when --no-tmux or tmux capture fails, inspect now falls back to reading the agent's stdout.log NDJSON file, parsing recent events to display tool activity and progress for headless agents

Fixed

  • Sapling buildDirectSpawn() crash — model resolution logic now guards against undefined model parameter instead of unconditionally calling .toUpperCase() on it; --model flag is only appended when a model is actually specified
  • Sapling API key leakANTHROPIC_API_KEY is now explicitly cleared in the child process environment to prevent the parent session's key from leaking into sapling subprocesses; gateway providers re-set it as needed

Testing

  • 3201 tests across 98 files (7551 expect() calls)

0.8.4 - 2026-03-04

Added

Per-Capability Runtime Routing

  • runtime.capabilities config field — maps capability names (e.g. builder, scout, coordinator) to runtime adapter names, enabling heterogeneous fleets where different agent roles use different runtimes
  • getRuntime() now accepts a capability parameter; lookup chain: explicit --runtime flag > capabilities[cap] > default > "claude"
  • 4 tests covering capability routing, fallback, explicit override, and undefined capabilities

Runtime-Agnostic Transcript Discovery

  • getTranscriptDir() method added to AgentRuntime interface — each runtime adapter now owns its transcript directory resolution instead of hardcoding Claude Code paths in the costs command
  • All 6 runtime adapters implement getTranscriptDir() (Claude returns project-specific path; others return null)

Dynamic Instruction Path Discovery

  • getKnownInstructionPaths() in agents.ts now queries all registered runtimes via getAllRuntimes() instead of maintaining a hardcoded list, so new runtimes are automatically discovered

Fixed

  • Dirty working tree merge guardov merge now detects uncommitted changes to tracked files before attempting a merge and throws a clear error, preventing cascading failures through all 4 tiers with misleading empty conflict lists
  • 5 tests covering the dirty-tree detection in resolver.test.ts

Changed

  • Decoupled Claude Code specifics from costs, transcript, and agent discovery modules — estimateCost re-export removed from transcript.ts (import directly from pricing.ts), transcript dir resolution moved from costs command into runtime adapters, instruction path list derived from runtime registry

Testing

  • 3137 tests across 96 files (7420 expect() calls)

0.8.3 - 2026-03-04

Added

Auto-Generated Agent Names

  • ov sling no longer requires --name — when omitted, generates a unique name from {capability}-{taskId}, with -2, -3 suffixes to avoid collisions against active sessions
  • generateAgentName() helper exported from src/commands/sling.ts with collision-avoidance logic

Direct Scout/Builder Spawn

  • Coordinator can now spawn scouts and builders directly — previously only lead was allowed without --parent; scouts and builders are now also permitted for lightweight tasks that don't need a lead intermediary

Runtime-Aware Instruction Path

  • {{INSTRUCTION_PATH}} placeholder in agent definitions — all agent .md files now use a runtime-resolved placeholder instead of hardcoded .claude/CLAUDE.md, enabling Codex (AGENTS.md), Sapling (SAPLING.md), and other runtimes to place overlays at their native instruction path
  • instructionPath field added to OverlayConfig type and generateOverlay() function

Fixed

  • Codex runtime startupbuildSpawnCommand() now uses interactive codex (not codex exec) so sessions stay alive in tmux; omits --model for Anthropic aliases that Codex CLI doesn't accept (thanks @vidhatanand)
  • Zombie agent cleanupov stop now cleans up zombie agents (marks them completed) instead of erroring with "already zombie"
  • Headless stdout redirectov sling always redirects headless agent stdout to file, preventing backpressure-induced zombie processes
  • Config warning deduplication — non-Anthropic model warnings in validateConfig now emit once per process instead of on every loadConfig() call
  • Codex bare model refsvalidateConfig now accepts bare model references (e.g., gpt-5.3-codex) when the default runtime is codex, instead of requiring provider-prefixed format

Changed

  • Agent definition .md files updated to use {{INSTRUCTION_PATH}} placeholder (builder, lead, merger, reviewer, scout, supervisor, orchestrator)

Testing

  • 3130 tests across 96 files (7406 expect() calls)

0.8.2 - 2026-03-04

Added

RuntimeConnection Registry

  • src/runtimes/connections.ts — module-level connection registry for active RuntimeConnection instances, tracking RPC connections to headless agent processes (e.g., Sapling) keyed by agent name
  • getConnection(), setConnection(), removeConnection() for lifecycle management with automatic close() on removal
  • 6 tests in src/runtimes/connections.test.ts

Sapling RPC Enhancements

  • RuntimeConnection for SaplingRuntime — full RPC support enabling direct stdin/stdout communication with Sapling agent processes
  • Model alias resolution in buildEnv() and buildDirectSpawn() — expands sonnet/opus/haiku aliases correctly

Fixed

  • Headless backpressure zombieov sling now redirects headless agent stdout/stderr to log files to prevent backpressure from causing zombie processes
  • deployConfig guard write — always writes guards.json even when overlay is undefined, preventing missing guard files for headless runtimes
  • Sapling model alias resolution — correct alias expansion in both buildEnv() and buildDirectSpawn() paths

Testing

  • 3116 tests across 96 files (7373 expect() calls)

0.8.1 - 2026-03-04

Added

Sapling Runtime Adapter

  • Sapling (sp) runtime adapter — full AgentRuntime implementation for the Sapling headless coding agent
  • Headless: runs as a Bun subprocess (no tmux TUI), communicates via NDJSON event stream on stdout (--json)
  • Instruction file: SAPLING.md written to worktree root (agent overlay content)
  • Guard deployment: .sapling/guards.json written from guard-rules.ts constants
  • Model alias resolution: expands sonnet/opus/haiku aliases via ANTHROPIC_DEFAULT_*_MODEL env vars
  • buildEnv() configures ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, provider routing
  • Registered in runtime registry as "sapling", available via ov sling --runtime sapling
  • Sapling v0.1.5 event types added to EventType union and theme labels
  • 972 lines of test coverage in src/runtimes/sapling.test.ts

Headless Agent Spawn Path

  • Headless spawn in ov sling — when runtime.headless === true, bypasses tmux entirely and spawns agents as direct Bun subprocesses
  • New src/worktree/process.ts module: spawnHeadlessAgent() for direct Bun.spawn() invocation, HeadlessProcess interface for PID/stdin/stdout management
  • DirectSpawnOpts and AgentEvent types added to src/runtimes/types.ts
  • Headless fields added to AgentRuntime interface

Headless Agent Lifecycle Support

  • ov status, ov dashboard, ov inspect updated to handle tmux-less (headless) agents gracefully
  • ov stop updated with headless process termination via PID-based killProcessTree()
  • Health evaluation in src/watchdog/health.ts supports headless agent lifecycle (PID liveness instead of tmux session checks)

Fixed

  • CLAUDECODE env clearing — clear CLAUDECODE env var in tmux sessions for Claude Code >=2.1.66 compatibility
  • Stale comment — update --mode rpc comment to --json in process.ts

Changed

  • Runtime adapters grew from 5 to 6 (added Sapling)

Testing

  • 3089 tests across 95 files (7324 expect() calls)
  • New test files: src/runtimes/sapling.test.ts, src/agents/guard-rules.test.ts, src/worktree/process.test.ts, src/commands/stop.test.ts, src/commands/status.test.ts, src/commands/dashboard.test.ts, src/watchdog/health.test.ts

0.8.0 - 2026-03-03

Added

Coordinator Interaction Subcommands

  • ov coordinator send — fire-and-forget message to the running coordinator via mail + auto-nudge, replacing the two-step ov mail send + ov nudge pattern
  • ov coordinator ask — synchronous request/response to the coordinator; sends a dispatch mail with a correlationId, auto-nudges, polls for a reply in the same thread, and exits with the reply body (configurable --timeout, default 120s)
  • ov coordinator output — show recent coordinator output via tmux capture-pane (configurable --lines, default 100)
  • 334 lines of new test coverage in src/commands/coordinator.test.ts

Orchestrator Agent Definition

  • agents/orchestrator.md — new base agent definition for multi-repo coordination above the coordinator level
  • Defines the orchestrator role: dispatches coordinators per sub-repo via ov coordinator start --project, monitors via mail, never modifies code directly
  • Named failure modes: DIRECT_SLING, CODE_MODIFICATION, SPEC_WRITING, OVERLAPPING_REPO_SCOPE, OVERLAPPING_FILE_SCOPE, DIRECT_MERGE, PREMATURE_COMPLETION, SILENT_FAILURE, POLLING_LOOP
  • 239 lines of agent definition

Operator Message Protocol for Coordinator

  • operator-messages section added to agents/coordinator.md — defines how coordinators handle synchronous human requests from the CLI
  • Reply format: always reply via ov mail reply with correlationId echo
  • Status request format: structured Active leads / Completed / Blockers / Next actions
  • Dispatch, stop, merge, and unrecognized request handling rules

--project Global Flag

  • ov --project <path> — target a different project root for any command, overriding auto-detection
  • Validates that the target path contains .overstory/config.yaml; throws ConfigError if missing
  • setProjectRootOverride() / getProjectRootOverride() / clearProjectRootOverride() in src/config.ts
  • 66 lines of new test coverage in src/config.test.ts

ov update Command

  • ov update — refresh .overstory/ managed files from the installed npm package without requiring a full ov init
  • Refreshes: agent definitions (agent-defs/*.md), agent-manifest.json, hooks.json, .gitignore, README.md
  • Does NOT touch: config.yaml, config.local.yaml, SQLite databases, agent state, worktrees, specs, logs, or .claude/settings.local.json
  • Flags: --agents, --manifest, --hooks, --dry-run, --json
  • Excludes deprecated agent defs (supervisor.md)
  • 464 lines of test coverage in src/commands/update.test.ts

Changed

  • Agent types grew from 7 to 8 (added orchestrator)
  • CLI commands grew from 32 to 34 (added update, coordinator send, coordinator ask, coordinator output)

Testing

  • 2923 tests across 92 files (6852 expect() calls)

0.7.9 - 2026-03-03

Added

Gemini CLI Runtime Adapter

  • Gemini CLI (gemini) runtime adapter — full AgentRuntime implementation for Google's Gemini coding agent
  • TUI-based interactive mode via tmux (Ink-based TUI, similar to Copilot adapter)
  • Instruction file: GEMINI.md written to worktree root (agent overlay content)
  • Sandbox support via --sandbox flag, --approval-mode yolo for auto-approval
  • Headless mode: gemini -p "prompt" for one-shot calls
  • Transcript parsing from --output-format stream-json NDJSON events
  • Registered in runtime registry as "gemini", available via ov sling --runtime gemini
  • 537 lines of test coverage in src/runtimes/gemini.test.ts

Model Alias Expansion via Environment Variables

  • ANTHROPIC_DEFAULT_{ALIAS}_MODEL env vars — expand model aliases (sonnet, opus, haiku) to specific model IDs at runtime
  • expandAliasFromEnv() in src/agents/manifest.ts checks ANTHROPIC_DEFAULT_SONNET_MODEL, ANTHROPIC_DEFAULT_OPUS_MODEL, ANTHROPIC_DEFAULT_HAIKU_MODEL
  • Applied during resolveModel() — env var values override default alias resolution
  • 169 lines of new test coverage in src/agents/manifest.test.ts

Fixed

  • .overstory/.gitignore — un-ignore agent-defs/ contents so custom agent definitions are tracked by git
  • CI lint — fix import sort order in sling.test.ts

Testing

  • 2888 tests across 91 files (6768 expect() calls)

0.7.8 - 2026-03-02

Added

Shell Init Delay

  • runtime.shellInitDelayMs config option — configurable delay between tmux session creation and TUI readiness polling, giving slow shells (oh-my-zsh, nvm, starship, etc.) time to initialize before the agent command starts
  • Applied to both ov sling and ov coordinator start spawn paths
  • Validation: must be non-negative number; values above 30s trigger a warning

--base-branch Flag for ov sling

  • ov sling --base-branch <branch> — override the base branch for worktree creation instead of using the canonical branch
  • Resolution order: --base-branch flag > current HEAD > config.project.canonicalBranch
  • New getCurrentBranch() helper in src/commands/sling.ts

Token Snapshot Run Tracking

  • run_id column added to token_snapshots table — snapshots are now tagged with the active run ID when recorded
  • getLatestSnapshots() accepts optional runId parameter to filter snapshots by run
  • ov costs --live now scopes to current run when --run is provided
  • Migration migrateSnapshotRunIdColumn() safely adds the column to existing databases

Tmux Session State Detection

  • checkSessionState() in src/worktree/tmux.ts — detailed session state reporting that distinguishes "alive", "dead", and "no_server" states (vs the boolean isSessionAlive())
  • Used by coordinator to provide targeted error messages and clean up stale sessions

Fixed

Coordinator Zombie Detection

  • src/commands/coordinator.tsov coordinator start now detects zombie coordinator sessions (tmux pane exists but agent process has exited) and automatically reclaims them instead of blocking with "already running"
  • Stale sessions where tmux is dead or server is not running are now cleaned up before re-spawning
  • Handles pid-null edge case (sessions from older schema) conservatively

Shell Init Delay Validation

  • src/config.ts — validates shellInitDelayMs is a non-negative finite number; warns on values above 30s; falls back to default (0) on invalid input

Testing

  • 2830 tests across 90 files (6689 expect() calls)
  • src/metrics/pricing.test.ts — new test suite covering getPricingForModel() and estimateCost()
  • src/metrics/store.test.ts — snapshot run_id recording and filtering tests
  • src/commands/coordinator.test.ts — zombie detection, stale session cleanup, and pid-null edge case tests
  • src/commands/sling.test.ts--base-branch flag and getCurrentBranch() tests
  • src/config.test.tsshellInitDelayMs validation tests
  • src/worktree/tmux.test.tscheckSessionState() tests

0.7.7 - 2026-02-27

Added

Codex Runtime Adapter

  • src/runtimes/codex.ts — new CodexRuntime adapter implementing the AgentRuntime interface for OpenAI's codex CLI, with headless codex exec mode, OS-level sandbox security (Seatbelt/Landlock), AGENTS.md instruction path, and NDJSON event stream parsing for token usage
  • src/runtimes/codex.test.ts — comprehensive test suite (741 lines) covering spawn command building, config deployment, readiness detection, and transcript parsing
  • Runtime registry now includes codex alongside claude, pi, and copilot

Documentation

  • docs/runtime-adapters.md — contributor guide (991 lines) covering the AgentRuntime interface, all four built-in adapters, the registry pattern, and a step-by-step walkthrough for adding new runtimes

Changed

Dashboard Redesign

  • src/commands/dashboard.ts — rewritten with rolling event buffer, compact panels, and new multi-panel layout (Agents 60% + Tasks/Feed 40%, Mail + Merge Queue row, Metrics row)

Fixed

  • src/commands/init.test.ts — use no-op spawner in init tests to avoid CI failures from tmux/subprocess side effects

Testing

  • 2779 tests across 89 files (6591 expect() calls)

0.7.6 - 2026-02-27

Added

Copilot Runtime Adapter

  • src/runtimes/copilot.ts — new CopilotRuntime adapter implementing the AgentRuntime interface for GitHub Copilot's copilot CLI, with --allow-all-tools permission mode, .github/copilot-instructions.md instruction path, and transcript parsing support
  • src/runtimes/copilot.test.ts — comprehensive test suite (507 lines) covering spawn command building, config deployment, readiness detection, and transcript parsing
  • Runtime registry now includes copilot alongside claude and pi

Ecosystem Bootstrap in ov init

  • ov init now bootstraps sibling os-eco tools — automatically runs mulch init, sd init, and cn init when the respective CLIs are available; adds CLAUDE.md onboarding sections for each tool
  • New flags: --tools <list> (comma-separated tool selection), --skip-mulch, --skip-seeds, --skip-canopy, --skip-onboard, --json
  • src/commands/init.test.ts — expanded with ecosystem bootstrap tests (335 lines total)

Doctor Provider Checks

  • src/doctor/providers.ts — new providers check category (11th category) validating gateway provider reachability, auth token environment variables, and tool-use compatibility for multi-runtime configurations
  • src/doctor/providers.test.ts — test suite (373 lines) covering provider validation scenarios

Multi-Provider Model Pricing

  • src/metrics/pricing.ts — extended with OpenAI (GPT-4o, GPT-4o-mini, GPT-5, o1, o3) and Google Gemini (Flash, Pro) pricing alongside existing Claude tiers

Cost Analysis Enhancements

  • --bead <id> flag for ov costs — filter cost breakdown by task/bead ID via new MetricsStore.getSessionsByTask() method
  • Runtime-aware transcript discoveryov costs --self now resolves transcript paths through the runtime adapter instead of hardcoding Claude Code paths

Agent Discovery Improvements

  • Runtime-aware instruction path in ov agents discoverextractFileScope() now tries the configured runtime's instructionPath before falling back to KNOWN_INSTRUCTION_PATHS

Changed

  • CI: CHANGELOG-based GitHub release notes — publish workflow now extracts the version's CHANGELOG.md section for release notes instead of auto-generating from commits; falls back to --generate-notes if no entry found

Fixed

  • Pi coding agent URL updated in README to correct repository path

Testing

  • 2714 tests across 88 files (6481 expect() calls)

0.7.5 - 2026-02-26

Fixed

  • tmux "command too long" error — coordinator, monitor, and supervisor commands now pass agent definition file paths instead of inlining content via --append-system-prompt; the shell inside the tmux pane reads the file via $(cat ...) at runtime, keeping the tmux IPC message small regardless of agent definition size (fixes #45)
  • Biome formatting in seeds tracker test (src/tracker/seeds.test.ts)

Changed

  • SpawnOpts.appendSystemPromptFile — new option in AgentRuntime interface (src/runtimes/types.ts) for file-based system prompt injection; both Claude and Pi runtime adapters support it with fallback to inline appendSystemPrompt
  • README and package description updated to be runtime-agnostic, reflecting the AgentRuntime abstraction

Testing

  • 2612 tests across 86 files (6277 expect() calls)

0.7.4 - 2026-02-26

Added

Runtime-Agnostic Pricing Module

  • src/metrics/pricing.ts — extracted pricing logic from transcript.ts into a standalone module with TokenUsage, ModelPricing, getPricingForModel(), and estimateCost() exports, enabling any runtime (not just Claude Code) to use cost estimation without pulling in JSONL-specific parsing logic

Multi-Runtime Instruction File Discovery

  • KNOWN_INSTRUCTION_PATHS in agents.tsextractFileScope() now tries .claude/CLAUDE.md then AGENTS.md (future Codex support) instead of hardcoding Claude Code's overlay path

Mulch Classification Guidance

  • --classification guidance in all 8 agent definitions — builder, coordinator, lead, merger, monitor, reviewer, and scout definitions updated with --classification <foundational|tactical|observational> guidance for ml record commands, with inline descriptions of when to use each classification level

Pi Runtime Improvements

  • agent_end handler in Pi guard extensions — Pi agents now log session-end when the agentic loop completes (via agent_end event), preventing watchdog false-positive zombie escalation; session_shutdown handler kept as a safety net for crashes and force-kills
  • --tool-name forwarding in Pi guard extensions — ov log tool-start and ov log tool-end calls now correctly forward the tool name

Testing

  • Tracker adapter test suites — comprehensive tests for beads (src/tracker/beads.test.ts, 454 lines) and seeds (src/tracker/seeds.test.ts, 469 lines) backends covering CLI invocation, JSON parsing, error handling, and edge cases
  • Test suite grew from 2550 to 2607 tests across 86 files (6269 expect() calls)

Fixed

  • OVERSTORY_GITIGNORE import in prime.ts — removed duplicate constant definition, now imports from init.ts where the canonical constant lives
  • Pi agent zombie-state bug — without the agent_end handler, completed Pi agents were never marked "completed" in the SessionStore, causing the watchdog to escalate them through stalled → nudge → triage → terminate
  • Shell completions for sling — added missing --runtime flag to shell completion definitions (PR #39, thanks @lucabarak)
  • cleanupTempDir ENOENT/EBUSY handling — tightened catch block for ENOENT errors and added retry logic for EBUSY from SQLite WAL handles on Windows (#41)

0.7.3 - 2026-02-26

Added

Outcome Feedback Loop

  • Mulch outcome tracking — sling now captures applied mulch record IDs at spawn time (saved to .overstory/agents/{name}/applied-records.json) and ov log session-end appends "success" outcomes back to those records, closing the expertise feedback loop
  • MulchClient.appendOutcome() method for programmatic outcome recording with status, duration, agent, notes, and test results fields

Mulch Search/Prime Enrichment

  • --classification filter for mulch search (foundational, tactical, observational)
  • --outcome-status filter for mulch search (success, failure)
  • --sort-by-score support in mulch prime for relevance-ranked expertise injection

Dashboard Redesign

  • Tasks panel — upper-right quadrant displays tracker issues with priority colors
  • Feed panel — lower-right quadrant shows recent events from the last 5 minutes
  • dimBox — dimmed box-drawing characters for less aggressive panel borders
  • computeAgentPanelHeight() — dynamic agent panel sizing (min 8, max 50% screen, scales with agent count)
  • Tracker caching with 10s TTL to reduce repeated CLI calls
  • Layout restructured to 60/40 split (agents left, tasks+feed right) with 50/50 mail/merge at bottom

Formatting

  • formatEventLine() — centralized compact event formatting with agent colors and event labels (used by both feed and dashboard)
  • numericPriorityColor() — maps numeric priorities (1–4) to semantic colors
  • buildAgentColorMap() and extendAgentColorMap() — stable color assignment for agents by appearance order

Sling

  • --no-scout-check flag to suppress scout-before-build warning
  • shouldShowScoutWarning() — testable logic for when to warn about missing scouts

Testing

  • 2550 tests across 84 files (6167 expect() calls), up from 2476/83/6044
  • New src/logging/format.test.ts — coverage for event line formatting and color utilities

Fixed

Pi Runtime

  • EventStore visibility — removed stdin-only gate on EventStore writes so Pi agents get full event tracking without stdin payload (ov log tool-start/tool-end)
  • Tool name forwarding — Pi guard extensions now pass --tool-name to ov log calls, fixing missing tool names in event timelines

Shell Completions

  • Added missing --runtime flag to sling completions
  • Synced all shell completion scripts (bash/zsh/fish) with current CLI commands and flags
  • Added --no-scout-check and --all (dashboard) to completions

Feed

  • Restored formatEventLine() usage lost during dashboard-builder merge conflict

Testing Infrastructure

  • Retry temp dir cleanup on EBUSY from SQLite WAL handles (exponential backoff, 5 retries) — fixes flaky cleanup on Windows
  • Tightened cleanupTempDir() ENOENT handling

Changed

  • Dashboard layout restructured from single-column to multi-panel grid with dynamic sizing
  • Feed and dashboard now share centralized event formatting via formatEventLine()
  • Brand color lightened for better terminal contrast

0.7.2 - 2026-02-26

Added

Pi Runtime Enhancements

  • Configurable model alias expansionPiRuntimeConfig type with provider + modelMap fields so bare aliases like "opus" are correctly expanded to provider-qualified model IDs (e.g., "anthropic/claude-opus-4-6"), configurable via config.yaml runtime.pi section
  • requiresBeaconVerification?() — optional method on AgentRuntime interface; Pi returns false to skip the beacon resend loop that spams duplicate startup messages (Pi's idle/processing states are indistinguishable via pane content)
  • Config validation for runtime.pi.provider and runtime.pi.modelMap entries

Fixed

Pi Runtime

  • Zombie-state bug — Pi agents were stuck in zombie state because pi-guards.ts used the old () => Extension object-style API instead of the correct (pi: ExtensionAPI) => void factory style; guards were never firing. Rewritten to ExtensionAPI factory format with proper event.toolName and { block, reason } returns
  • Activity tracking — Added pi.on(tool_call/tool_execution_end/session_shutdown) handlers so lastActivity updates and the watchdog no longer misclassifies active Pi agents as zombies
  • Beacon verification loopsling.ts now skips the beacon resend loop when runtime.requiresBeaconVerification() returns false, preventing duplicate startup messages for Pi agents
  • detectReady() — Fixed to check for Pi TUI header (pi v) + token-usage status bar regex instead of model: which Pi never emits
  • Pi guard extension tests updated for ExtensionAPI format (8 fixes + 7 new tests)

Agent Definitions

  • Replaced 54 hardcoded "bead" references in agent base definitions with tracker-agnostic terminology (task/issue); {{TRACKER_CLI}} and {{TRACKER_NAME}} placeholders remain for CLI commands
  • Fixed overlay fallback default from "bd" to "sd" (seeds is the preferred tracker)

Changed

  • Supervisor agent soft-deprecatedov supervisor commands marked [DEPRECATED] with stderr warning on start; supervisor removed from default agent manifest and ov init agent-defs copy; supervisor.md retains deprecation notice but code is preserved for backward compatibility
  • biome.json excludes .pi/ directory from linting (generated extension files)

Testing

  • 2476 tests across 83 files (6044 expect() calls)

0.7.1 - 2026-02-26

Added

Pi Runtime Adapter

  • src/runtimes/pi.tsPiRuntime adapter implementing AgentRuntime for Mario Zechner's Pi coding agent — buildSpawnCommand() maps to pi --model, deployConfig() writes .pi/extensions/overstory-guard.ts + .pi/settings.json, detectReady() looks for Pi TUI header, parseTranscript() handles Pi's top-level message_end / model_change JSONL format
  • src/runtimes/pi-guards.ts — Pi guard extension generator (generatePiGuardExtension()) — produces self-contained TypeScript files for .pi/extensions/ that enforce the same security policies as Claude Code's settings.local.json PreToolUse hooks (team tool blocking, write tool blocking, path boundary enforcement, dangerous bash pattern detection)
  • src/runtimes/types.tsRuntimeConnection interface for RPC lifecycle: sendPrompt(), followUp(), abort(), getState(), close() — enables direct stdin/stdout communication with runtimes that support it (Pi JSON-RPC), bypassing tmux for mail delivery, shutdown, and health checks
  • src/runtimes/types.tsRpcProcessHandle and ConnectionState supporting types for the RPC connection interface
  • AgentRuntime.connect?() — optional method on the runtime interface for establishing direct RPC connections; orchestrator checks if (runtime.connect) before calling, falls back to tmux when absent
  • Pi runtime registered in src/runtimes/registry.ts

Guard Rule Extraction

  • src/agents/guard-rules.ts — extracted shared guard constants (NATIVE_TEAM_TOOLS, INTERACTIVE_TOOLS, WRITE_TOOLS, DANGEROUS_BASH_PATTERNS, SAFE_BASH_PREFIXES) from hooks-deployer.ts into a pure data module — single source of truth consumed by both Claude Code hooks and Pi guard extensions

Transcript Path Decoupling

  • transcriptPath field on AgentSession — new nullable column in sessions.db, populated by runtimes that report their transcript location directly instead of relying on ~/.claude/projects/ path inference
  • SessionStore.updateTranscriptPath() — new method to set transcript path per agent
  • ov log transcript resolution — now checks session.transcriptPath first before falling back to legacy ~/.claude/projects/ heuristic; discovered paths are also written back to the session store for future lookups
  • SQLite migration (migrateAddTranscriptPath) adds the column to existing databases safely

runtime.printCommand Config Field

  • OverstoryConfig.runtime.printCommand — new optional config field for routing headless one-shot AI calls (merge resolver, watchdog triage) through a specific runtime adapter, independent of the default interactive runtime

Testing

  • src/runtimes/pi.test.ts — 526-line test suite covering all 7 AgentRuntime methods for the Pi adapter
  • src/runtimes/pi-guards.test.ts — 389-line test suite for Pi guard extension generation across capabilities, path boundaries, and edge cases
  • Test suite: 2458 tests across 83 files (6026 expect() calls)

Fixed

  • Watchdog completion nudges clarified as informationalbuildCompletionMessage() now says "Awaiting lead verification" instead of "Ready for merge/cleanup", preventing coordinators from prematurely merging based on watchdog nudges
  • Coordinator PREMATURE_MERGE anti-pattern strengthened — coordinator.md now explicitly states that watchdog nudges are informational only and that only a typed merge_ready mail from the owning lead authorizes a merge
  • transcriptPath: null added to all AgentSession constructions — fixes schema consistency across coordinator, supervisor, monitor, and sling agent creation paths

Changed

  • deployHooks() replaced by runtime.deployConfig() — coordinator, supervisor, monitor, and sling now use the runtime abstraction for deploying hooks/guards instead of calling deployHooks() directly, enabling Pi (and future runtimes) to deploy their native guard mechanisms
  • merge/resolver.ts wired through runtime.buildPrintCommand() — AI-assisted merge resolution (Tier 3 and Tier 4) now uses the configured runtime for headless calls instead of hardcoding claude --print
  • watchdog/triage.ts wired through runtime.buildPrintCommand() — AI-assisted failure triage now uses the configured runtime for headless calls instead of hardcoding claude --print
  • writeOverlay() receives runtime.instructionPath — sling now threads the runtime's instruction file path through overlay generation, so beacon and auto-dispatch messages reference the correct file (e.g. .claude/CLAUDE.md for Claude, same for Pi)

0.7.0 - 2026-02-25

Added

AgentRuntime Abstraction Layer

  • src/runtimes/types.tsAgentRuntime interface defining the contract for multi-provider agent support: buildSpawnCommand(), buildPrintCommand(), deployConfig(), detectReady(), parseTranscript(), buildEnv(), plus supporting types (SpawnOpts, ReadyState, OverlayContent, HooksDef, TranscriptSummary)
  • src/runtimes/claude.tsClaudeRuntime adapter implementing AgentRuntime for Claude Code CLI — delegates to existing subsystems (hooks-deployer, transcript parser) without new behavior
  • src/runtimes/registry.ts — Runtime registry with getRuntime() factory — lookup by name, config default, or hardcoded "claude" fallback
  • docs/runtime-abstraction.md — Design document covering coupling inventory, phased migration plan, and adapter contract rationale
  • --runtime <name> flag on ov sling — allows per-agent runtime override (defaults to config or "claude")
  • runtime.default config field — new optional OverstoryConfig.runtime.default property for setting the default runtime adapter

Testing

  • src/runtimes/claude.test.ts — 616-line test suite for ClaudeRuntime adapter covering all 7 interface methods
  • src/runtimes/registry.test.ts — Registry tests for name lookup, config default fallback, and unknown runtime errors
  • src/commands/sling.test.ts — Additional sling tests for runtime integration
  • src/agents/overlay.test.ts — Tests for parameterized instructionPath in writeOverlay()
  • 2357 tests across 81 files (5857 expect() calls)

Changed

Runtime Rewiring (Phase 2)

  • src/commands/sling.ts — Rewired to use AgentRuntime.buildSpawnCommand() and detectReady() instead of hardcoded claude CLI construction and TUI heuristics
  • src/commands/coordinator.ts — Rewired to use AgentRuntime for spawn command building, env construction, and TUI readiness detection
  • src/commands/supervisor.ts — Rewired to use AgentRuntime for spawn command building and TUI readiness detection
  • src/commands/monitor.ts — Rewired to use AgentRuntime for spawn command building and env construction
  • src/worktree/tmux.tswaitForTuiReady() now accepts a detectReady callback instead of hardcoded Claude Code TUI heuristics, making it runtime-agnostic
  • src/agents/overlay.tswriteOverlay() now accepts an optional instructionPath parameter (default: .claude/CLAUDE.md), enabling runtime-specific instruction file paths

Branding

  • README.md: replaced ASCII ecosystem diagram with os-eco logo image

0.6.12 - 2026-02-25

Added

Shared Visual Primitives

  • src/logging/theme.ts — canonical visual theme for CLI output: agent state colors/icons, event type labels (compact + full), agent color palette for multi-agent displays, separator characters, and header/sub-header rendering helpers
  • src/logging/format.ts — shared formatting utilities: duration formatting (formatDuration), absolute/relative/date timestamp formatting, event detail builder (buildEventDetail), agent color mapping (buildAgentColorMap/extendAgentColorMap), status color helpers for merge/priority/log-level

Theme/Format Adoption Across Observability Commands

  • Dashboard, status, inspect, metrics, run, and costs commands refactored to use shared theme/format primitives — eliminates duplicated color maps, duration formatters, and separator rendering across 6 commands
  • Errors, feed, logs, replay, and trace commands refactored to use shared theme/format primitives — eliminates duplicated event label rendering, timestamp formatting, and agent color assignment across 5 commands
  • Net code reduction: ~826 lines removed, replaced by ~214+132 lines of shared primitives

Mulch Programmatic API Migration

  • MulchClient.record(), search(), and query() migrated from Bun.spawn CLI wrappers to @os-eco/mulch-cli programmatic API — eliminates subprocess overhead for high-frequency expertise operations
  • @os-eco/mulch-cli added as runtime dependency (^0.6.2) — first programmatic API dependency in the ecosystem
  • Variable-based dynamic import pattern (const MULCH_PKG = "..."; import(MULCH_PKG)) prevents tsc from statically resolving into mulch's raw .ts source files
  • Local MulchExpertiseRecord and MulchProgrammaticApi type definitions avoid cross-project noUncheckedIndexedAccess conflicts

MetricsStore Improvements

  • countSessions() method — returns total session count without the LIMIT cap that getRecentSessions() applies, fixing accurate session count reporting in metrics views

Lead Agent Workflow Improvements

  • WORKTREE_ISSUE_CREATE failure mode — prevents leads from running {{TRACKER_CLI}} create in worktrees, where issues are lost on cleanup
  • Lead workflow updated to mail coordinator for issue creation instead of direct tracker CLI calls — coordinator creates issues on main branch
  • Scout/builder/reviewer spawning simplified with --skip-task-check — removes the pattern of creating separate tracker issues for each sub-agent
  • {{TRACKER_CLI}} create removed from lead capabilities list

Testing

  • Test suite grew from 2283 to 2288 tests across 79 files (5744 expect() calls)

Changed

  • 12 observability commands consolidated onto shared theme.ts + format.ts primitives — reduces per-command boilerplate and ensures visual consistency across all CLI output
  • @types/js-yaml added as dev dependency (^4.0.9)

Fixed

  • Static imports of theme.ts/format.ts replaced with variable-based dynamic pattern to fix typecheck errors when tsc follows into mulch's raw .ts source files
  • getRecentSessions() limit cap no longer affects session count reporting — dedicated countSessions() method provides uncapped counts

0.6.11 - 2026-02-25

Added

Per-Lead Agent Budget Ceiling

  • agents.maxAgentsPerLead config (default: 5) — limits how many active children a single lead agent can spawn; set to 0 for unlimited
  • --max-agents <n> flag on ov sling — CLI override for the per-lead ceiling when spawning under a parent
  • checkParentAgentLimit() — pure-function guard that counts active children per parent and blocks spawns at the limit

Dispatch-Level Overrides

  • --skip-review flag on ov sling — instructs a lead agent to skip Phase 3 review and self-verify instead (reads builder diff + runs quality gates)
  • --dispatch-max-agents <n> flag on ov sling — per-lead agent ceiling override injected into the overlay so the lead knows its budget
  • formatDispatchOverrides() in overlay system — generates a ## Dispatch Overrides section in lead overlays when skipReview or maxAgentsOverride are set
  • dispatch-overrides section in agents/lead.md — documents the override protocol so leads know to check their overlay before following the default three-phase workflow
  • DispatchPayload extended with skipScouts, skipReview, and maxAgents optional fields

Duplicate Lead Prevention

  • checkDuplicateLead() — prevents two lead agents from concurrently working the same task ID, avoiding the duplicate work stream anti-pattern (overstory-gktc postmortem)

Mail Refactoring

  • shouldAutoNudge() and isDispatchNudge() exported from mail.ts for testability — previously inlined logic now unit-testable
  • AUTO_NUDGE_TYPES exported as ReadonlySet for direct test assertions

Testing

  • sling.test.ts — expanded (201 lines added) covering checkDuplicateLead, checkParentAgentLimit, per-lead budget ceiling enforcement, and dispatch override validation
  • overlay.test.ts — expanded (236 lines added) covering formatDispatchOverrides, skip-review overlay, max-agents overlay, and combined overrides
  • mail.test.ts — expanded (64 lines added) covering shouldAutoNudge, isDispatchNudge, and dispatch nudge behavior
  • hooks-deployer.test.ts — new test file (105 lines) covering hooks deployment and configurable safe prefix extraction
  • config.test.ts — expanded (22 lines added) covering maxAgentsPerLead validation

Changed

  • Terminology normalization — replaced "beads" with "task" throughout CLI copy and generic code: checkBeadLockcheckTaskLock, {{BEAD_ID}}{{TASK_ID}} in overlay template, error messages updated ("Bead is already being worked" → "Task is already being worked")
  • README unified to canonical os-eco template — shortened, restructured with table-based CLI reference, consistent badge style
  • agents/lead.md — added dispatch-overrides section documenting SKIP REVIEW and MAX AGENTS override protocol
  • Default tracker name changed from "beads" to "seeds" in overlay fallback

Fixed

  • ov trace description — changed from "agent/bead" to "agent or task" for consistency with terminology normalization

Testing

  • 2283 tests across 79 files (5749 expect() calls)

0.6.10 - 2026-02-25

Added

New CLI Commands

  • ov ecosystem — dashboard showing all installed os-eco tools (overstory, mulch, seeds, canopy) with version info, update status (current vs latest from npm), and overstory doctor health summary; supports --json output
  • ov upgrade — upgrade overstory (or all ecosystem tools with --all) to their latest npm versions via bun install -g; --check flag compares versions without installing; supports --json output

ov doctor Enhancements

  • --fix flag — auto-fix capability for doctor checks; fixable checks now include repair closures that are executed when --fix is passed, with human-readable action summaries
  • Fix closures added to all check modules — structure, databases, merge-queue, and ecosystem checks now return fix functions that can recreate missing directories, reinitialize databases, and reinstall tools
  • ecosystem check category — new 10th doctor category validating that os-eco CLI tools (ml, sd, cn) are on PATH and report valid semver versions; fix closures reinstall via bun install -g

Global CLI Flag

  • --timing flag — prints command execution time to stderr after any command completes (e.g., Done in 42ms)

Configurable Quality Gates

  • Quality gate placeholders in agent prompts — agent base definitions (builder, merger, reviewer, lead) now use {{QUALITY_GATE_*}} placeholders instead of hardcoded bun test/bun run lint/bun run typecheck commands, driven by project.qualityGates config
  • 4 quality gate formatter functionsformatQualityGatesInline, formatQualityGateSteps, formatQualityGateBash, formatQualityGateCapabilities added to overlay system for flexible placeholder resolution
  • Configurable safe command prefixesSAFE_BASH_PREFIXES in hooks-deployer now dynamically extracted from quality gate config via extractQualityGatePrefixes(), replacing hardcoded bun test/bun run lint/bun run typecheck entries
  • Config-driven hooks deploymentsling.ts now passes config.project.qualityGates through to deployHooks() so non-implementation agents can run project-specific quality gate commands

Testing

  • ecosystem.test.ts — new test file (307 lines) covering ecosystem command output, JSON mode, and tool detection
  • upgrade.test.ts — new test file (46 lines) covering upgrade command registration and option parsing
  • databases.test.ts — new test file (38 lines) covering database health check fix closures
  • merge-queue.test.ts — new test file (98 lines) covering merge queue health check and fix closures
  • structure.test.ts — expanded (131 lines added) covering structure check fix closures for missing directories
  • overlay.test.ts — expanded (157 lines added) covering quality gate formatters and placeholder resolution
  • hooks-deployer.test.ts — expanded (52 lines added) covering configurable safe prefix extraction

Changed

  • Agent base definitions updated — builder, merger, reviewer, and lead .md files now use {{QUALITY_GATE_*}} template placeholders instead of hardcoded bun commands
  • DEFAULT_QUALITY_GATES consolidated — removed duplicate definition from overlay.ts, now imported from config.ts as single source of truth

Fixed

  • DoctorCheck.fix return type — changed from void to string[] so fix closures can report what actions were taken
  • Feed follow-mode --json output — now uses jsonOutput envelope instead of raw JSON.stringify
  • --timing preAction — correctly reads opts.timing from global options instead of hardcoded check
  • process.exit(1) in completions.ts — replaced with process.exitCode = 1; return to avoid abrupt process termination

Testing

  • 2241 tests across 79 files (5694 expect() calls)

0.6.9 - 2026-02-25

Added

ov init Enhancements

  • --yes / -y flag — skip interactive confirmation prompts for scripted/automated initialization (contributed by @lucabarak via PR #37)
  • --name <name> flag — explicitly set the project name instead of auto-detecting from git remote or directory name

Standardized JSON Output Across All Commands

  • JSON envelope applied to all remaining commands — four batches (A, B, C, D) migrated every --json code path to use the jsonOutput()/jsonError() envelope format ({ success, command, ...data }), completing the ecosystem-wide standardization started in 0.6.8

Accented ID Formatting

  • accent() applied to IDs in human-readable output — agent names, mail IDs, group IDs, run IDs, and task IDs now render with accent color formatting across status, dashboard, inspect, agents, mail, merge, group, run, trace, and errors commands

Testing

  • hooks-deployer.test.ts — new test file (180 lines) covering hooks deployment to worktrees
  • init.test.ts — new test file (104 lines) covering --yes and --name flag behavior

Changed

Print Helper Adoption

  • Completions, prime, and watch commands migrated to print helpers — remaining commands that used raw console.log/console.error now use printSuccess/printWarning/printError/printHint for consistent output formatting

Fixed

  • PATH prefix for hook commands — deployed hooks now include ~/.bun/bin in the PATH prefix, fixing resolution failures when bun-installed CLIs (like ov itself) weren't found by hook subprocesses
  • Reinit messaging for --yes flag — corrected output messages when re-initializing an existing .overstory/ directory with the --yes flag

Testing

  • 2186 tests across 77 files (5535 expect() calls)

0.6.8 - 2026-02-25

Added

Standardized CLI Output Helpers

  • jsonOutput() / jsonError() helpers (src/json.ts) — standard JSON envelope format ({ success, command, ...data }) matching the ecosystem convention used by mulch, seeds, and canopy
  • printSuccess() / printWarning() / printError() / printHint() helpers (src/logging/color.ts) — branded message formatters with consistent color/icon treatment (brand checkmark, yellow !, red cross, dim indent)

Enhanced CLI Help & Error Experience

  • Custom branded help screenov --help now shows a styled layout with colored command names, dim arguments, and version header instead of Commander.js defaults
  • --version --json flagov -v --json outputs machine-readable JSON ({ name, version, runtime, platform })
  • Unknown command fuzzy matching — typos like ov stauts now suggest the closest match via Levenshtein edit distance ("Did you mean 'status'?")

TUI Trust Dialog Handling

  • Auto-confirm workspace trust dialogwaitForTuiReady now detects "trust this folder" prompts and sends Enter automatically, preventing agents from stalling on first-time workspace access

Changed

Consistent Message Formatting Across All Commands

  • All 30 commands migrated to message helpers — three batches (A, B, C) updated every command to use printSuccess/printWarning/printError/printHint instead of ad-hoc console.log/console.error calls, ensuring uniform output style
  • Global error handler uses jsonError() — top-level catch in index.ts now outputs structured JSON envelopes when --json is passed, instead of raw console.error

TUI Readiness Detection

  • Two-phase readiness checkwaitForTuiReady now requires both a prompt indicator ( or Try ") AND status bar text (bypass permissions or shift+tab) before declaring the TUI ready, preventing premature beacon submission

Agent Definition Cleanup

  • Slash-command prompts moved to .claude/commands/issue-reviews.md, pr-reviews.md, prioritize.md, and release.md removed from agents/ directory (they are skill definitions, not agent base definitions)
  • Agent definition wording updates — minor reference fixes across coordinator, lead, merger, reviewer, scout, and supervisor base definitions

Fixed

  • color.test.ts mocking — tests now mock process.stdout.write/process.stderr.write instead of console.log/console.error to match actual implementation
  • mulch client test updated for auto-create domain behavior
  • mulchml alias in tests — test files migrated to use the ml short alias consistently

Testing

  • 2167 tests across 77 files (5465 expect() calls)

0.6.7 - 2026-02-25

Fixed

Permission Flag Migration

  • Replace --dangerously-skip-permissions with --permission-mode bypassPermissions across all agent spawn paths (coordinator, supervisor, sling, monitor) — adapts to updated Claude Code CLI flag naming

Status Output

  • Remove remaining emoji from ov status output — section headers (Agents, Worktrees, Mail, Merge queue, Sessions recorded) and deprecation warning now use plain text; alive markers use colored >/x instead of /

Changed

Agent Spawn Reliability

  • Increase TUI readiness timeout from 15s to 30swaitForTuiReady now waits longer for Claude Code TUI to initialize, reducing false-negative timeouts on slower machines
  • Smarter TUI readiness detectionwaitForTuiReady now checks for actual TUI markers ( prompt or Try " text) instead of any pane content, preventing premature readiness signals
  • Extend follow-up Enter delays — beacon submission retries expanded from [1s, 2s] to [1s, 2s, 3s, 5s] in sling, coordinator, and supervisor, improving reliability when Claude Code TUI initializes slowly

Testing

  • 2151 tests across 76 files (5424 expect() calls)

0.6.6 - 2026-02-24

Changed

CLI Alias Migration

  • overstoryov across all CLI-facing text — every user-facing string, error message, help text, and command comment across all src/commands/*.ts files now references ov instead of overstory
  • mulchml in agent definitions and overlay — all 8 base agent definitions (agents/*.md), overlay template (templates/overlay.md.tmpl), and overlay generator (src/agents/overlay.ts) updated to use the ml short alias
  • Templates and hooks updatedtemplates/CLAUDE.md.tmpl, templates/hooks.json.tmpl, and deployed agent defs all reference ov/ml aliases
  • Canopy prompts re-emitted — all canopy-managed prompts regenerated with alias-aware content

Emoji-Free CLI Output (Set D Icons)

  • Status icons replaced with ASCII Set D — dashboard, status, and sling output now use > (working), - (booting), ! (stalled), x (zombie/completed), ? (unknown) instead of Unicode circles and checkmarks
  • All emoji removed from CLI output — warning prefixes, launch messages, and status indicators no longer use emoji characters, improving compatibility with terminals that lack Unicode support

Added

Sling Reliability

  • Auto-dispatch mail before tmux sessionbuildAutoDispatch() sends dispatch mail to the agent's mailbox before creating the tmux session, eliminating the race where coordinator dispatch arrives after the agent boots and sits idle
  • Beacon verification loop — after beacon send, sling polls the tmux pane up to 5 times (2s intervals) to detect if the agent is still on the welcome screen; if so, resends the beacon automatically (fixes overstory-3271)
  • capturePaneContent() exported from tmux.ts — new helper for reading tmux pane text, used by beacon verification

Binary Detection

  • detectOverstoryBinDir() tries both ov and overstory — loops through both command names when resolving the binary directory, ensuring compatibility regardless of how the tool was installed

Claude Code Skills

  • /release skill — prepares releases by analyzing changes, bumping versions, updating CHANGELOG/README/CLAUDE.md
  • /issue-reviews skill — reviews GitHub issues from within Claude Code
  • /pr-reviews skill — reviews GitHub pull requests from within Claude Code

Testing

  • Test suite: 2151 tests across 76 files (5424 expect() calls)

Fixed

  • Mail dispatch race for newly slung agents — dispatch mail is now written to SQLite before tmux session creation, ensuring it exists when the agent's SessionStart hook fires ov mail check
  • process.exit(1) replaced with process.exitCode = 1 — CLI entry point no longer calls process.exit() directly, allowing Bun to clean up gracefully (async handlers, open file descriptors)
  • Remaining beadIdtaskId references — completed rename in trace.ts, trace.test.ts, spec.ts, worktree.test.ts, and canopy prompts for coordinator/supervisor
  • Post-merge quality gate failures — fixed lint and type errors introduced during multi-agent merge sessions
  • Mail test assertions — updated to match lowercase Warning/Note output after emoji removal

0.6.5 - 2026-02-24

Added

Seeds Preservation for Lead Branches

  • preserveSeedsChanges() in worktree manager — extracts .seeds/ diffs from lead agent branches and applies them to the canonical branch via patch before worktree cleanup, preventing loss of issue files created by leads whose branches are never merged through the normal merge pipeline
  • Integrated into overstory worktree clean — automatically preserves seeds changes before removing completed worktrees

Merge Union Gitattribute Support

  • resolveConflictsUnion() in merge resolver — new auto-resolve strategy for files with merge=union gitattribute that keeps all lines from both sides (canonical + incoming), relying on dedup-on-read to handle duplicates
  • checkMergeUnion() helper — queries git check-attr merge to detect union merge strategy per file
  • Auto-resolve tier now checks gitattributes before choosing between keep-incoming and union resolution strategies

Sling Preflight

  • ensureTmuxAvailable() preflight in sling command — verifies tmux is available before attempting session creation, providing a clear error instead of cryptic spawn failures

Testing

  • Test suite: 2145 tests across 76 files (5410 expect() calls)

Changed

  • beadIdtaskId rename across all TypeScript source — comprehensive rename of the beadId field to taskId in all source files, types, interfaces, and tests, completing the tracker abstraction naming migration started in v0.6.0
  • gatherStatus() uses evaluateHealth() — status command now applies the full health evaluation from the watchdog module for agent state reconciliation, matching dashboard and watchdog behavior (handles tmux-dead→zombie, persistent capability booting→working, and time-based stale/zombie detection)

Fixed

  • Single quote escaping in blockGuard shell commands — fixed shell escaping in blockGuard patterns that could cause guard failures when arguments contained single quotes
  • Dashboard version from package.json — dashboard now reads version dynamically from package.json instead of a hardcoded value
  • Seeds config project name — renamed project from "seeds" to "overstory" in .seeds/config.yaml and fixed 71 misnamed issue IDs

0.6.4 - 2026-02-24

Added

Commander.js CLI Framework

  • Full CLI migration to Commander.js — all 30+ commands migrated from custom args array parsing to Commander.js with typed options, subcommand hierarchy, and auto-generated --help; migration completed in 6 incremental commits covering core workflow, nudge, mail, observability, infrastructure, and final cleanup
  • Shell completions via CommandercreateCompletionsCommand() now uses Commander's built-in completion infrastructure

Chalk v5 Color System

  • Chalk-based color modulesrc/logging/color.ts rewritten from custom ANSI escape code strings to Chalk v5 wrapper functions with native NO_COLOR/FORCE_COLOR/TERM=dumb support
  • Brand palette — three named brand colors exported: brand (forest green), accent (amber), muted (stone gray) via chalk.rgb()
  • Chainable color APIcolor.bold, color.dim, color.red, etc. now delegate to Chalk for composable styling

Testing

  • Merge queue SQL schema consistency tests added
  • Test suite: 2128 tests across 76 files (5360 expect() calls)

Changed

  • Runtime dependencies — chalk v5 added as first runtime dependency (previously zero runtime deps); chalk is ESM-only and handles color detection natively
  • CLI parsing — all commands converted from manual args array indexing to Commander.js .option() / .argument() declarations with automatic type coercion and validation
  • Color module APIcolor export changed from a record of ANSI string constants to a record of Chalk wrapper functions; consumers call color.red("text") (function) instead of ${color.red}text${color.reset} (string interpolation)
  • noColor identity function — replaces the old color.white default for cases where no coloring is needed

Fixed

  • Merge queue migration — added missing bead_idtask_id column migration for merge-queue.db, aligning with the schema migration already applied to sessions.db, events.db, and metrics.db in v0.6.0
  • npm publish auth — fixed authentication issues in publish workflow and cleaned up post-merge artifacts from Commander migration
  • Commander direct parse — fixed 6 command wrapper functions that incorrectly delegated to Commander instead of using direct .action() pattern (metrics, replay, status, trace, supervisor, and others)

0.6.3 - 2026-02-24

Added

Interactive Tool Blocking for Agents

  • PreToolUse guards block interactive toolsAskUserQuestion, EnterPlanMode, and EnterWorktree are now blocked for all overstory agents via hooks-deployer, preventing indefinite hangs in non-interactive tmux sessions; agents must use overstory mail --type question to escalate instead

Doctor Ecosystem CLI Checks

  • Expanded overstory doctor dependency checks — now validates all ecosystem CLIs (overstory, mulch, seeds, canopy) with alias availability checks (ov, ml) and install hints (npm install -g @os-eco/<pkg>)
  • Short alias detection: when a primary tool passes, doctor also checks if its short alias (e.g., ov for overstory, ml for mulch) is available, with actionable fix hints

CLI Improvements

  • ov short aliasoverstory CLI is now also available as ov via package.json bin entry
  • /prioritize skill — new Claude Code command that analyzes open GitHub Issues and Seeds issues, cross-references with codebase health, and recommends the top ~5 issues to tackle next
  • Skill headers — all Claude Code slash commands now include descriptive headers for better discoverability

CI/CD

  • Publish workflow — replaced auto-tag.yml with publish.yml that runs quality gates, checks version against npm, publishes with provenance, creates git tags and GitHub releases automatically

Performance

  • SessionStore.count() — lightweight SELECT COUNT(*) method replacing getAll().length pattern in openSessionStore() existence checks

Testing

  • Test suite grew from 2090 to 2137 tests across 76 files (5370 expect() calls)
  • SQL schema consistency tests for all four SQLite stores (sessions.db, mail.db, events.db, metrics.db)
  • Provider config and model resolution edge case tests
  • Sling provider environment variable injection building block tests

Fixed

  • Tmux dead session detection in waitForTuiReady() — now checks isSessionAlive() on each poll iteration and returns early if the session died, preventing 15-second timeout waits on already-dead sessions
  • ensureTmuxAvailable() guard — new pre-flight check throws a clear AgentError when tmux is not installed, replacing cryptic spawn failures
  • package.json files array — reformatted for Biome compatibility

Changed

  • CI workflow: auto-tag.yml replaced by publish.yml with npm publish, provenance, and GitHub release creation
  • Config field references updated: beadstaskTracker in remaining locations

0.6.2 - 2026-02-24

Added

Sling Guard Improvements

  • --skip-task-check flag for overstory sling — skips task existence validation and issue claiming, designed for leads spawning builders with worktree-created issues that don't exist in the canonical tracker yet
  • Bead lock parent bypass — parent agent can now delegate its own task ID to a child without triggering the concurrent-work lock (sling allows spawn when the lock holder matches --parent)
  • Lead agent --skip-task-check added to default sling template in agents/lead.md

Lead Agent Spec Writing

  • Leads now use overstory spec write <id> --body "..." --agent $OVERSTORY_AGENT_NAME instead of Write/Edit tools for creating spec files — enforces read-only tool posture while still enabling spec creation

Testing

  • Test suite grew from 2087 to 2090 tests across 75 files (5137 expect() calls)

Fixed

  • Dashboard health evaluation — dashboard now applies the full evaluateHealth() function from the watchdog module instead of only checking tmux liveness; correctly transitions persistent capabilities (coordinator, monitor) from bootingworking when tmux is alive, and detects stale/zombie states using configured thresholds
  • Default tracker resolution to seedsresolveBackend() now falls back to "seeds" when no tracker directory exists (previously defaulted to "beads")
  • Coordinator beacon uses resolveBackend() — properly resolves "auto" backend instead of a simple conditional that didn't handle auto-detection
  • Doctor dependency checks use resolveBackend() — properly resolves "auto" backend for tracker CLI availability checks instead of assuming beads
  • Hardcoded 'orchestrator' replaced with 'coordinator' — overlay template default parent address, agent definitions (builder, merger, monitor, scout), and test assertions all updated to use coordinator as the default parent/mail recipient

Changed

  • Lead agent definition: Write/Edit tools removed from capabilities, replaced with overstory spec write CLI command
  • Agent definitions (builder, merger, monitor, scout) updated to reference "coordinator" instead of "orchestrator" in mail examples and constraints

0.6.1 - 2026-02-23

Added

Canopy Integration for Agent Prompt Management

  • All 8 agent definitions (agents/*.md) restructured for Canopy prompt composition — behavioral sections (propulsion-principle, cost-awareness, failure-modes, overlay, constraints, communication-protocol, completion-protocol) moved to the top of each file with kebab-case headers, core content sections (intro, role, capabilities, workflow) placed after
  • Section headers converted from Title Case (## Role) to kebab-case (## role) across all agent definitions for Canopy schema compatibility

Hooks Deployer Merge Behavior

  • deployHooks() now preserves existing settings.local.json content when deploying hooks — merges with non-hooks keys (permissions, env, $schema, etc.) instead of overwriting the entire file
  • isOverstoryHookEntry() exported for detecting overstory-managed hook entries — enables stripping stale overstory hooks while preserving user-defined hooks
  • Overstory hooks placed before user hooks per event type so security guards always run first

Testing

  • Test suite grew from 2075 to 2087 tests across 75 files (5150 expect() calls)

Changed

  • Dogfooding tracker migrated from beads to seeds.beads/ directory removed, .seeds/ directory added with all issues migrated
  • Biome ignore pattern updated: .beads/.seeds/

Fixed

  • deployHooks() no longer overwrites existing settings.local.json — previously deploying hooks for coordinator/supervisor/monitor agents at the project root would destroy any existing settings (permissions, user hooks, env vars)

0.6.0 - 2026-02-23

Added

Tracker Abstraction Layer

  • src/tracker/ module — pluggable task tracker backend system replacing the hardcoded beads dependency
    • TrackerClient interface with unified API: ready(), show(), create(), claim(), close(), list(), sync()
    • TrackerIssue type for backend-agnostic issue representation
    • createTrackerClient() factory function dispatching to concrete backends
    • resolveBackend() auto-detection — probes .seeds/ then .beads/ directories when configured as "auto"
    • trackerCliName() helper returning "sd" or "bd" based on resolved backend
    • Beads adapter (src/tracker/beads.ts) — wraps bd CLI with --json parsing
    • Seeds adapter (src/tracker/seeds.ts) — wraps sd CLI with --json parsing
    • Factory tests (src/tracker/factory.test.ts) — 80 lines covering resolution and client creation

Configurable Quality Gates

  • QualityGate type ({ name, command, description }) in types.ts — replaces hardcoded bun test && bun run lint && bun run typecheck
  • project.qualityGates config field — projects can now define custom quality gate commands in config.yaml
  • DEFAULT_QUALITY_GATES constant in config.ts — preserves the default 3-gate pipeline (Tests, Lint, Typecheck)
  • Quality gate validation in validateConfig() — ensures each gate has non-empty name, command, and description
  • Overlay template renders configured gates dynamically instead of hardcoded commands
  • OverlayConfig.qualityGates field threads gates from config through to agent overlays

Config Migration for Task Tracker

  • taskTracker: { backend, enabled } config field replaces legacy beads: and seeds: sections
  • Automatic migration: beads: { enabled: true }taskTracker: { backend: "beads", enabled: true } (and same for seeds:)
  • TaskTrackerBackend type: "auto" | "beads" | "seeds" with "auto" as default
  • Deprecation warnings emitted when legacy config keys are detected

Template & Agent Definition Updates

  • TRACKER_CLI and TRACKER_NAME template variables in overlay.ts — agent defs no longer hardcode bd/beads
  • All 8 agent definitions (agents/*.md) updated: bdTRACKER_CLI, beadsTRACKER_NAME
  • Coordinator beacon updated with tracker-aware context
  • Hooks-deployer safe prefixes updated for tracker CLI commands

Hooks Improvements

  • mergeHooksByEventType()overstory hooks install --force now merges hooks per event type with deduplication instead of wholesale replacement, preserving user-added hooks

Testing

  • Test suite grew from 2026 to 2075 tests across 75 files (5128 expect() calls)

Changed

  • beads → taskTracker config: config.beads renamed to config.taskTracker with backward-compatible migration
  • bead_id → task_id: Column renamed across all SQLite schemas (metrics.db, merge-queue.db, sessions.db, events.db) with automatic migration for existing databases
  • group.ts and supervisor.ts now use tracker abstraction instead of direct beads client calls
  • sling.ts uses resolveBackend() and trackerCliName() from factory module
  • Doctor dependency checks updated to detect the active tracker CLI (bd or sd)

Fixed

  • overstory hooks install --force now merges hooks by event type instead of replacing the entire settings file — preserves non-overstory hooks
  • detectCanonicalBranch() now accepts any branch name (removed restrictive regex)
  • bead_idtask_id SQLite column migration for existing databases (metrics, merge-queue, sessions, events)
  • config.seedsconfig.taskTracker bootstrap path in sling.ts
  • group.ts and supervisor.ts now use resolveBackend() for proper tracker resolution instead of hardcoded backend
  • Seeds adapter validates envelope success field before unwrapping response data
  • Hooks tests use literal keys instead of string indexing for noUncheckedIndexedAccess compliance
  • Removed old src/beads/ directory (replaced by src/tracker/)

0.5.9 - 2026-02-21

Added

New CLI Commands

  • overstory stop <agent-name> — explicitly terminate a running agent by killing its tmux session, marking the session as completed in SessionStore, with optional --clean-worktree to remove the agent's worktree (17 tests, DI pattern via StopDeps)

Sling Guard Features

  • Bead lockcheckBeadLock() pure function prevents concurrent agents from working the same bead ID, enforced in slingCommand before spawning
  • Run session capcheckRunSessionLimit() pure function with maxSessionsPerRun config field (default 0 = unlimited), enforced in slingCommand to limit concurrent agents per run
  • --skip-scout flag — passes through to overlay via OverlayConfig.skipScout, renders SKIP_SCOUT_SECTION in template for lead agents that want to skip scout phase

Agent Pipeline Improvements

  • Complexity-tiered pipeline in lead agent definition — leads now assess task complexity (simple/moderate/complex) before deciding whether to spawn scouts, builders, and reviewers
  • Scouts made optional for simple/moderate tasks (SHOULD vs MUST)
  • Reviewers made optional with self-verification path for simple/moderate tasks
  • SCOUT_SKIP and REVIEW_SKIP failure modes softened to warnings
  • Scout and reviewer agents simplified: replaced INSIGHT: protocol with plain notable findings

Testing

  • Test suite grew from 1996 to 2026 tests across 74 files (5023 expect() calls)

Changed

  • Lead agent role reframed to reflect that leads can be doers for simple tasks, not just delegators
  • Lead propulsion principle updated to assess complexity before acting
  • Lead cost awareness section no longer mandates reviewers

Fixed

  • Biome formatting in stop.test.ts (pre-existing lint issue)

0.5.8 - 2026-02-20

Added

Provider Model Resolution

  • ResolvedModel type and provider gateway support in resolveModel() — resolves ModelRef strings (e.g., openrouter/openai/gpt-5.3) through configured provider gateways with baseUrl and authTokenEnv
  • Provider and model validation in validateConfig() — validates provider types (native/gateway), required gateway fields (baseUrl), and model reference format at config load time
  • Provider environment variables now threaded through all agent spawn commands (sling, coordinator, supervisor, monitor) — gateway authTokenEnv values are passed to spawned agent processes

Mulch Integration

  • Auto-infer mulch domains from file scope in overstory slinginferDomainsFromFiles() maps file paths to domains (e.g., src/commands/*.tscli, src/agents/*.tsagents) instead of always using configured defaults
  • Outcome flags for MulchClient.record()--outcome-status, --outcome-duration, --outcome-test-results, --outcome-agent for structured outcome tracking
  • File-scoped search in MulchClient.search()--file and --sort-by-score options for targeted expertise queries
  • PostToolUse Bash hook in hooks template and init — runs mulch diff after git commits to auto-detect expertise changes

Agent Definition Updates

  • Builder completion protocol includes outcome data flags (--outcome-status success --outcome-agent $OVERSTORY_AGENT_NAME)
  • Lead and supervisor agents get file-scoped mulch search capability (mulch search <query> --file <path>)
  • Overlay quality gates include outcome flags for mulch recording

Dashboard Performance

  • limit option added to MailStore.getAll() — dashboard now fetches only the most recent messages instead of the full mailbox
  • Persistent DB connections across dashboard poll ticks — SessionStore, EventStore, MailStore, and MetricsStore connections are now opened once and reused, eliminating per-tick open/close overhead

Testing

  • Test suite grew from 1916 to 1996 tests across 73 files (4960 expect() calls)

Fixed

  • Zombie agent recovery — updateLastActivity now recovers agents from "zombie" state when hooks prove they're alive (previously only recovered from "booting")
  • Dashboard .repeat() crash when negative values were passed — now clamps repeat count to minimum of 0
  • Set-based tmux session lookup in status.ts replacing O(n) array scans with O(1) Set membership checks
  • Subprocess cache in status.ts preventing redundant tmux list-sessions calls during a single status gather
  • Null-runId sessions (coordinator) now included in run-scoped status and dashboard views — previously filtered out when --all was not specified
  • Sparse file used in logs doctor test to prevent timeout on large log directory scans
  • Beacon submission reliability — replaced fixed sleep with poll-based TUI readiness check (PR #19, thanks @dmfaux!)
  • Biome formatting in hooks-deployer test and sling

0.5.7 - 2026-02-19

Added

Provider Types

  • ModelAlias, ModelRef, and ProviderConfig types in types.ts — foundation for multi-provider model routing (native and gateway provider types with baseUrl and authTokenEnv configuration)
  • providers field in OverstoryConfigRecord<string, ProviderConfig> for configuring model providers per project
  • resolveModel() signature updated to accept ModelRef (provider-qualified strings like openrouter/openai/gpt-5.3) alongside simple ModelAlias values

Costs Command

  • --self flag for overstory costs — parse the current orchestrator session's Claude Code transcript directly, bypassing metrics.db, useful for real-time cost visibility without agent infrastructure

Metrics

  • run_id column added to metrics.db sessions table — enables overstory costs --run <id> filtering to work correctly; includes automatic migration for existing databases

Watchdog

  • Phase-aware buildCompletionMessage() in watchdog daemon — generates targeted completion nudge messages based on worker capability composition (single-capability batches get phase-specific messages like "Ready for next phase", mixed batches get a summary with breakdown)

Testing

  • Test suite grew from 1892 to 1916 tests across 73 files (4866 expect() calls)

0.5.6 - 2026-02-18

Added

Safety Guards

  • Root-user pre-flight guard on all agent spawn commands (sling, coordinator start, supervisor start, monitor start) — blocks spawning when running as UID 0, since the claude CLI rejects --dangerously-skip-permissions as root causing tmux sessions to die immediately
  • Unmerged branch safety check in overstory worktree clean — skips worktrees with unmerged branches by default, warns about skipped branches, and requires --force to delete them

Init Improvements

  • .overstory/README.md generation during overstory init — explains the directory to contributors who encounter .overstory/ in a project, whitelisted in .gitignore

Tier 2 Monitor Config Gating

  • overstory monitor start now gates on watchdog.tier2Enabled config flag — throws a clear error when Tier 2 is disabled instead of silently proceeding
  • overstory coordinator start --monitor respects tier2Enabled — skips monitor auto-start with a message when disabled

Tmux Error Handling

  • sendKeys now distinguishes "tmux server not running" from "session not found" — provides actionable error messages for each case (e.g., root-user hint for server-not-running)

Documentation

  • Lead agent definition (agents/lead.md) reframed as coordinator-not-doer — emphasizes the lead's role as a delegation specialist rather than an implementer

Testing

  • Test suite grew from 1868 to 1892 tests across 73 files (4807 expect() calls)

Fixed

  • Biome formatting in merged builder code

0.5.5 - 2026-02-18

Added

Run Scoping

  • overstory status now scopes to the current run by default with --all flag to show all runs — gatherStatus() filters sessions by runId when present
  • overstory dashboard now scopes all panels to the current run by default with --all flag to show data across all runs

Config Local Overrides

  • config.local.yaml support for machine-specific configuration overrides — values in config.local.yaml are deep-merged over config.yaml, allowing per-machine settings (model overrides, paths, watchdog intervals) without modifying the tracked config file (PR #9)

Universal Push Guard

  • PreToolUse hooks template now includes a universal git push guard — blocks all git push commands for all agents (previously only blocked push to canonical branches)

Watchdog Run-Completion Detection

  • Watchdog daemon tick now detects when all agents in the current run have completed and auto-reports run completion

Lead Agent Streaming

  • Lead agents now stream merge_ready messages per-builder as each completes, instead of batching all merge signals — enables earlier merge pipeline starts

Claude Code Command Skills

  • Added issue-reviews and pr-reviews skills for reviewing GitHub issues and pull requests from within Claude Code

Testing

  • Test suite grew from 1848 to 1868 tests across 73 files (4771 expect() calls)

Fixed

  • overstory sling now uses resolveModel() for config-level model overrides — previously ignored models: config section when spawning agents
  • overstory doctor dependency check now detects bd CGO/Dolt backend failures — catches cases where bd binary exists but crashes due to missing CGO dependencies (PR #11)
  • Biome line width formatting in src/doctor/consistency.ts

0.5.4 - 2026-02-17

Added

Reviewer Coverage Enforcement

  • Reviewer-coverage doctor check in overstory doctor — warns when leads spawn builders without corresponding reviewers, reports partial coverage ratios per lead
  • merge_ready reviewer validation in overstory mail send — advisory warning when sending merge_ready without reviewer sessions for the sender's builders

Scout-First Workflow Enforcement

  • Scout-before-builder warning in overstory sling — warns when a lead spawns a builder without having spawned any scouts first
  • parentHasScouts() helper exported from sling for testability

Run Auto-Completion

  • overstory coordinator stop now auto-completes the active run (reads current-run.txt, marks run completed, cleans up)
  • overstory log session-end auto-completes the run when the coordinator exits (handles tmux window close without explicit stop)

Gitignore Wildcard+Whitelist Model

  • .overstory/.gitignore flipped from explicit blocklist to wildcard * + whitelist pattern — ignore everything, whitelist only tracked files (config.yaml, agent-manifest.json, hooks.json, groups.json, agent-defs/)
  • overstory prime auto-heals .overstory/.gitignore on each session start — ensures existing projects get the updated gitignore
  • OVERSTORY_GITIGNORE constant and writeOverstoryGitignore() exported from init.ts for reuse

Testing

  • Test suite grew from 1812 to 1848 tests across 73 files (4726 expect() calls)

Changed

  • Lead agent definition (agents/lead.md) — scouts made mandatory (not optional), Phase 3 review made MANDATORY with stronger language, added SCOUT_SKIP failure mode, expanded cost awareness section explaining why scouts and reviewers are investments not overhead
  • overstory init .gitignore now always overwrites (supports --force reinit and auto-healing)

Fixed

  • Hooks template (templates/hooks.json.tmpl) — removed fragile read -r INPUT; echo "$INPUT" | stdin relay pattern; overstory log now reads stdin directly via --stdin flag
  • readStdinJson() in log command — reads all stdin chunks for large payloads instead of only the first line
  • Doctor gitignore structure check updated for wildcard+whitelist model

0.5.3 - 2026-02-17

Added

Configurable Agent Models

  • models: section in config.yaml — override the default model (sonnet, opus, haiku) for any agent role (coordinator, supervisor, monitor, etc.)
  • resolveModel() helper in agent manifest — resolution chain: config override > manifest default > fallback
  • Supervisor and monitor entries added to agent-manifest.json with model and capability metadata
  • overstory init now seeds the default models: section in generated config.yaml

Testing

  • Test suite grew from 1805 to 1812 tests across 73 files (4638 expect() calls)

0.5.2 - 2026-02-17

Added

New Flags

  • --into <branch> flag for overstory merge — target a specific branch instead of always merging to canonicalBranch

Session Branch Tracking

  • overstory prime now records the orchestrator's starting branch to .overstory/session-branch.txt at session start
  • overstory merge reads session-branch.txt as the default merge target when --into is not specified — resolution chain: --into flag > session-branch.txt > config canonicalBranch

Testing

  • Test suite grew from 1793 to 1805 tests across 73 files (4615 expect() calls)

Changed

  • Git push blocking for agents now blocks ALL git push commands (previously only blocked push to canonical branches) — agents should use overstory merge instead
  • Init-deployed hooks now include a PreToolUse Bash guard that blocks git push for the orchestrator's project

Fixed

  • Test cwd pollution in agents test afterEach — restored cwd to prevent cross-file pollution

0.5.1 - 2026-02-16

Added

New CLI Commands

  • overstory agents discover — discover and query agents by capability, state, file scope, and parent with --capability, --state, --parent filters and --json output

New Subsystems

  • Session insight analyzer (src/insights/analyzer.ts) — analyzes EventStore data from completed sessions to extract structured patterns about tool usage, file edits, and errors for automatic mulch expertise recording
  • Conflict history intelligence in merge resolver — tracks past conflict resolution patterns per file to skip historically-failing tiers and enrich AI resolution prompts with successful strategies

Agent Improvements

  • INSIGHT recording protocol for agent definitions — read-only agents (scout, reviewer) use INSIGHT prefix for structured expertise observations; parent agents (lead, supervisor) record insights to mulch automatically

Testing

  • Test suite grew from 1749 to 1793 tests across 73 files (4587 expect() calls)

Changed

  • session-end hook now calls mulch record directly instead of sending mulch_learn mail messages — removes mail indirection for expertise recording

Fixed

  • Coordinator tests now always inject fake monitor/watchdog for proper isolation

0.5.0 - 2026-02-16

Added

New CLI Commands

  • overstory feed — unified real-time event stream across all agents with --follow mode for continuous polling, agent/run filtering, and JSON output
  • overstory logs — query NDJSON log files across agents with level filtering (--level), time range queries (--since/--until), and --follow tail mode
  • overstory costs --live — real-time token usage display for active agents

New Flags

  • --monitor flag for coordinator start/stop/status — manage the Tier 2 monitor agent alongside the coordinator

Agent Improvements

  • Mulch recording as required completion gate for all agent types — agents must record learnings before session close
  • Mulch learn extraction added to Stop hooks for orchestrator and all agents
  • Scout-spawning made default in lead.md Phase 1 with parallel support
  • Reviewer spawning made mandatory in lead.md

Infrastructure

  • Real-time token tracking infrastructure (src/metrics/store.ts, src/commands/costs.ts) — live session cost monitoring via transcript JSONL parsing

Testing

  • Test suite grew from 1673 to 1749 tests across 71 files (4460 expect() calls)

Fixed

  • Duplicate feed entry in CLI command router and help text

0.4.1 - 2026-02-16

Added

New CLI Commands & Flags

  • overstory --completions <shell> — shell completion generation for bash, zsh, and fish
  • --quiet / -q global flag — suppress non-error output across all commands
  • overstory mail send --to @all — broadcast messaging with group addresses (@all, @builders, @scouts, @reviewers, @leads, @mergers, etc.)

Output Control

  • Central NO_COLOR convention support (src/logging/color.ts) — respects NO_COLOR, FORCE_COLOR, and TERM=dumb environment variables per https://no-color.org
  • All ANSI color output now goes through centralized color module instead of inline escape codes

Infrastructure

  • Merge queue migrated from JSON file to SQLite (merge-queue.db) for durability and concurrent access

Testing

  • Test suite grew from 1612 to 1673 tests across 69 files (4267 expect() calls)

Fixed

  • Freeze duration counter for completed/zombie agents in status and dashboard displays

0.4.0 - 2026-02-15

Added

New CLI Commands

  • overstory doctor — comprehensive health check system with 9 check modules (dependencies, config, structure, databases, consistency, agents, merge-queue, version, logs) and formatted output with pass/warn/fail status
  • overstory inspect <agent> — deep per-agent inspection aggregating session data, metrics, events, and live tmux capture with --follow polling mode

New Flags

  • --watchdog flag for coordinator start — auto-starts the watchdog daemon alongside the coordinator
  • --debounce <ms> flag for mail check — prevents excessive mail checking by skipping if called within the debounce window
  • PostToolUse hook entry for debounced mail checking

Observability Improvements

  • Automated failure recording in watchdog via mulch — records failure patterns for future reference
  • Mulch learn extraction in log session-end — captures session insights automatically
  • Mulch health checks in overstory clean — validates mulch installation and domain health during cleanup

Testing

  • Test suite grew from 1435 to 1612 tests across 66 files (3958 expect() calls)

Fixed

  • Wire doctor command into CLI router and update command groups

0.3.0 - 2026-02-13

Added

New CLI Commands

  • overstory run command — orchestration run lifecycle management (list, show, complete subcommands) with RunStore backed by sessions.db
  • overstory trace command — agent/bead timeline viewing for debugging and post-mortem observability
  • overstory clean command — cleanup worktrees, sessions, and artifacts with auto-cleanup on agent teardown

Observability & Persistence

  • Run tracking via run_id integrated into sling and clean commands
  • RunStore in sessions.db for durable run state
  • SessionStore (SQLite) — migrated from sessions.json for concurrent access and crash safety
  • Phase 2 CLI query commands and Phase 3 event persistence for the observability pipeline

Agent Improvements

  • Project-scoped tmux naming (overstory-{projectName}-{agentName}) to prevent cross-project session collisions
  • ENV_GUARD on all hooks — prevents hooks from firing outside overstory-managed worktrees
  • Mulch-informed lead decomposition — leader agents use mulch expertise when breaking down tasks
  • Mulch conflict pattern recording — merge resolver records conflict patterns to mulch for future reference

MulchClient Expansion

  • New commands and flags for the mulch CLI wrapper
  • --json parsing support with corrected types and flag spread

Community & Documentation

  • STEELMAN.md — comprehensive risk analysis for agent swarm deployments
  • Community files: CONTRIBUTING.md, CODE_OF_CONDUCT.md, SECURITY.md
  • Package metadata (keywords, repository, homepage) for npm/GitHub presence

Testing

  • Test suite grew from 912 to 1435 tests across 55 files (3416 expect() calls)

Fixed

  • Fix isCanonicalRoot guard blocking all worktree overlays when dogfooding overstory on itself
  • Fix auto-nudge tmux corruption and deploy coordinator hooks correctly
  • Fix 4 P1 issues: orchestrator nudge routing, bash guard bypass, hook capture isolation, overlay guard
  • Fix 4 P1/P2 issues: ENV_GUARD enforcement, persistent agent state, project-scoped tmux kills, auto-nudge coordinator
  • Strengthen agent orchestration with additional P1 bug fixes

Changed

  • CLI commands grew from 17 to 20 (added run, trace, clean)

0.2.0 - 2026-02-13

Added

Coordinator & Supervisor Agents

  • overstory coordinator command — persistent orchestrator that runs at project root, decomposes objectives into subtasks, dispatches agents via sling, and tracks batches via task groups
    • start / stop / status subcommands
    • --attach / --no-attach with TTY-aware auto-detection for tmux sessions
    • Scout-delegated spec generation for complex tasks
  • Supervisor agent definition — per-project team lead (depth 1) that receives dispatch mail from coordinator, decomposes into worker-sized subtasks, manages worker lifecycle, and escalates unresolvable issues
  • 7 base agent types (added coordinator + supervisor to existing scout, builder, reviewer, lead, merger)

Task Groups & Session Lifecycle

  • overstory group command — batch coordination (create / status / add / remove / list) with auto-close when all member beads issues complete, mail notification to coordinator on auto-close
  • Session checkpoint save/restore for compaction survivability (prime --compact restores from checkpoint)
  • Handoff orchestration (initiate/resume/complete) for crash recovery

Typed Mail Protocol

  • 8 protocol message types: worker_done, merge_ready, merged, merge_failed, escalation, health_check, dispatch, assign
  • Type-safe sendProtocol<T>() and parsePayload<T>() for structured agent coordination
  • JSON payload column with schema migration handling 3 upgrade paths

Agent Nudging

  • overstory nudge command with retry (3x), debounce (500ms), and --force to skip debounce
  • Auto-nudge on urgent/high priority mail send

Structural Tool Enforcement

  • PreToolUse hooks mechanically block file-modifying tools (Write/Edit/NotebookEdit) for non-implementation agents (scout, reviewer, coordinator, supervisor)
  • PreToolUse Bash guards block dangerous git operations (push, reset --hard, clean -f, etc.) for all agents
  • Whitelist git add/commit for coordinator/supervisor capabilities while keeping git push blocked
  • Block Claude Code native team/task tools (Task, TeamCreate, etc.) for all overstory agents — enforces overstory sling delegation

Watchdog Improvements

  • ZFC principle: tmux liveness as primary signal, pid check as secondary, sessions.json as tertiary
  • Descendant tree walking for process cleanup — getPanePid(), getDescendantPids(), killProcessTree() with SIGTERM → grace → SIGKILL
  • Re-check zombies on every tick, handle investigate action
  • Stalled state added to zombie reconciliation

Worker Self-Propulsion (Phase 3)

  • Builder agents send worker_done mail on task completion
  • Overlay quality gates include worker_done signal step
  • Prime activation context injection for bound tasks
  • MISSING_WORKER_DONE failure mode in builder definition

Interactive Agent Mode

  • Switch sling from headless (claude -p) to interactive mode with tmux sendKeys beacon — hooks now fire, enabling mail, metrics, logs, and lastActivity updates
  • Structured buildBeacon() with identity context and startup protocol
  • Fix beacon sendKeys multiline bug (increase initial sleep, follow-up Enter after 500ms)

CLI Improvements

  • --verbose flag for overstory status
  • --json flag for overstory sling
  • --background flag for overstory watch
  • Help text for unknown subcommands
  • SUPPORTED_CAPABILITIES constant and Capability type

Init & Deployment

  • overstory init now deploys agent definitions (copies agents/*.md to .overstory/agent-defs/) via import.meta.dir resolution
  • E2E lifecycle test validates full init → config → manifest → overlay pipeline on throwaway external projects

Testing Improvements

  • Colocated tests with source files (moved from __tests__/ to src/)
  • Shared test harness: createTempGitRepo(), cleanupTempDir(), commitFile() in src/test-helpers.ts
  • Replaced Bun.spawn mocks with real implementations in 3 test files
  • Optimized test harness: 38.1s → 11.7s (-69%)
  • Comprehensive metrics command test coverage
  • E2E init-sling lifecycle test
  • Test suite grew from initial release to 515 tests across 24 files (1286 expect() calls)

Fixed

  • 60+ bugs resolved across 8 dedicated fix sessions, covering P1 criticals through P4 backlog items:
    • Hooks enforcement: tool guard sed patterns now handle optional space after JSON colons
    • Status display: filter completed sessions from active agent count
    • Session lifecycle: move session recording before beacon send to fix booting → working race condition
    • Stagger delay (staggerDelayMs) now actually enforced between agent spawns
    • Hardcoded main branch replaced with dynamic branch detection in worktree/manager and merge/resolver
    • Sling headless mode fixes for E2E validation
    • Input validation, environment variable handling, init improvements, cleanup lifecycle
    • .gitignore patterns for .overstory/ artifacts
    • Mail, merge, and worktree subsystem edge cases

Changed

  • Agent propulsion principle: failure modes, cost awareness, and completion protocol added to all agent definitions
  • Agent quality gates updated across all base definitions
  • Test file paths updated from __tests__/ convention to colocated src/**/*.test.ts

0.1.0 - 2026-02-12

Added

  • CLI entry point with command router (overstory <command>)
  • overstory init — initialize .overstory/ in a target project
  • overstory sling — spawn worker agents in git worktrees via tmux
  • overstory prime — load context for orchestrator or agent sessions
  • overstory status — show active agents, worktrees, and project state
  • overstory mail — SQLite-based inter-agent messaging (send/check/list/read/reply)
  • overstory merge — merge agent branches with 4-tier conflict resolution
  • overstory worktree — manage git worktrees (list/clean)
  • overstory log — hook event logging (NDJSON + human-readable)
  • overstory watch — watchdog daemon with health monitoring and AI-assisted triage
  • overstory metrics — session metrics storage and reporting
  • Agent manifest system with 5 base agent types (scout, builder, reviewer, lead, merger)
  • Two-layer agent definition: base .md files (HOW) + dynamic overlays (WHAT)
  • Persistent agent identity and CV system
  • Hooks deployer for automatic worktree configuration
  • beads (bd) CLI wrapper for issue tracking integration
  • mulch CLI wrapper for structured expertise management
  • Multi-format logging with secret redaction
  • SQLite metrics storage for session analytics
  • Full test suite using bun test
  • Biome configuration for formatting and linting
  • TypeScript strict mode with noUncheckedIndexedAccess