Performance Investigation Requirements

This is the canonical contract for the /perf workflow. All agents, skills, hooks, and commands must follow these rules.

Non-Negotiable Rules

Run benchmarks sequentially (never in parallel).
Default minimum run duration is 60s (30s for binary search in breaking-point phase). For single-run benchmarks (start-to-end), use an explicit run-count (e.g. 3 runs with median aggregation) instead of time padding; record the run count + aggregation in logs. Shorter durations are allowed only for micro-benchmarks with explicit user approval and must be recorded in logs.
Change one thing at a time; revert to baseline between experiments.
Start narrow and expand only with explicit user approval.
Verify anomalies by re-running.
Establish a clean baseline before any experiment.
Keep resource use minimal and repeatable.
Check git history before hypotheses or changes.
Clarify terminology before acting on ambiguous requests.
Write logs + checkpoint commit after every phase.

Every phase log must include:

Baseline command must output PERF_METRICS markers.
Baseline JSON is stored at {state-dir}/perf/baselines/.json.
Baseline metrics must be numeric and comparable across runs.
Benchmarks should honor PERF_RUN_MODE=oneshot by emitting metrics immediately without padding.

All perf state is stored under {state-dir}/perf/:

State directory is platform-specific: