feat(pipeline): add configurable multi-reviewer review panels#356
feat(pipeline): add configurable multi-reviewer review panels#356henrylaih41 wants to merge 8 commits into
Conversation
Let the review step fan out to N reviewers (mixed model families, e.g. codex + claude) reviewing the same diff independently; their findings are merged into one attributed union and the single configured agent reconciles and fixes. With no review.reviewers configured the behavior is byte-identical to today. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…alidation, and gate source rendering
…d-config parse path
…IDs, and merge evidence fields
…tive reviewer; share validation
…onal Reviewers inspect the diff and return findings; they never write the worktree. The shared review CWD is intentional and safe, so we do not isolate or clean up a per-reviewer worktree. A reviewer that writes is a misconfiguration, not a case this code defends against. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
For context: #327 is the previous CR for this feature, and #356 is the improved, gated re-raise of it. The core behavior is the same — opt-in cross-family multi-reviewer panel, attributed-union merge, single-fixer reconciliation, fail-closed default, and re-review-on-fix — with the no-reviewers case staying byte-identical to today. Core differences from #327
On the remaining warningsMost of the flagged issues are edge cases that don't impact core correctness — e.g. same-family reviewer The reason these aren't addressed in this CR: it has been through several rounds of review and has not converged after 8 rounds of loop. We decided to open the PR now since the remaining issues are not critical and not a blocker, and can follow up on them separately. |
|
Test step The The suite was run directly on this branch with the race detector enabled, and all tests pass: CI on this PR will verify this again. |
What Changed
Risk Assessment
Testing
Pipeline
Updates from git push no-mistakes
⏭️ **intent** - skipped
✅ No issues found.
✅ **Rebase** - passed
✅ No issues found.
internal/pipeline/steps/review_panel.go:37- runReviewPanel fans N reviewer agents out concurrently against one shared worktree (CWD = sctx.WorkDir). The code comment defends this on the data-safety axis ("reviewers are READ-ONLY by contract"), but the distinct operational risk is git-lock contention: real coding agents (claude/codex) routinely run subcommands that take .git/index.lock even for nominally read-only inspection (git status, git add -A, git stash), and two reviewers colliding on the lock surface an error. Under the DEFAULT fail-closed policy (review.fail_open=false) any single reviewer error fails the entire review step (processReviewerResults returns on first Err). Because the panel re-runs on every post-fix re-review, this is a repeated flakiness vector for opted-in repos. Worth confirming the tradeoff is acceptable or whether per-reviewer worktree isolation (or a lock-tolerant default) is wanted; the existing defense addresses corruption, not lock-contention-induced step failure.internal/types/findings.go:346- SeverityRank is newly exported and unit-tested but never referenced in production code. combineReviewerFindings explicitly states it does NO severity-escalation, so the intended consumer never shipped. It's harmless dead code (RiskRank, its sibling, is used); flagging only so it isn't mistaken for wired-up behavior. Either remove it or wire it into the intended severity reconciliation.internal/config/config.go:522-ReviewerArgsonly treats non-emptyargsas a per-reviewer override, soargs: []still inheritsagent_args_override. That makes it impossible to opt a reviewer out of a global model/flag override and can even dedup away an intended same-family reviewer because the effective args collapse to the inherited value. Consider checking whetherspec.Argsis nil rather thanlen(spec.Args) > 0so an explicit empty list means 'no extra args'.internal/pipeline/steps/review_panel.go:96- Panel attribution stamps every finding with onlyAgent.Name(). For the supported case of two same-family reviewers with differentargsorpath, both findings render/log as the same source such ascodex, so the human gate cannot tell which model/config produced which finding. Include the stable slot or an explicit reviewer label in the source/log label when same-family reviewers are distinct.⏭️ **Test** - skipped
Step was skipped.
✅ **Document** - passed
✅ No issues found.
✅ **Lint** - passed
✅ No issues found.
✅ **Push** - passed
✅ No issues found.