Skip to content

feat(pipeline): add configurable multi-reviewer review panels#356

Open
henrylaih41 wants to merge 8 commits into
kunchenguid:mainfrom
henrylaih41:feat/multi-reviewer-panel
Open

feat(pipeline): add configurable multi-reviewer review panels#356
henrylaih41 wants to merge 8 commits into
kunchenguid:mainfrom
henrylaih41:feat/multi-reviewer-panel

Conversation

@henrylaih41

Copy link
Copy Markdown

What Changed

  • Adds configurable multi-reviewer review panels, including reviewer fan-out, validation/deduplication, finding attribution, and pipeline/TUI rendering for panel-sourced review results.
  • Extends config parsing and trusted repo-config handling for reviewer panels, including reserved-argument checks, empty-argument validation, and effective-reviewer resolution.
  • Documents reviewer panel configuration, read-only reviewer expectations, gate behavior, agent guidance, and related global/repo config reference updates.

Risk Assessment

⚠️ Medium: [claude] The feature is opt-in and defaults to byte-identical single-agent behavior, the security/trust boundary for the new code-executing Review config is correctly handled, and the only substantive concern is concurrency flakiness for repos that explicitly enable a multi-reviewer panel.; [codex] The main fanout and trust-boundary paths are covered, but same-family reviewer configuration has attribution and arg-precedence edge cases that can silently undermine the advertised panel behavior.

Testing

  • ⏭️ Test - skipped

Pipeline

Updates from git push no-mistakes

⏭️ **intent** - skipped

✅ No issues found.

✅ **Rebase** - passed

✅ No issues found.

⚠️ **Review** - 4 issues (3 warnings, 1 info)
  • ⚠️ internal/pipeline/steps/review_panel.go:37 - runReviewPanel fans N reviewer agents out concurrently against one shared worktree (CWD = sctx.WorkDir). The code comment defends this on the data-safety axis ("reviewers are READ-ONLY by contract"), but the distinct operational risk is git-lock contention: real coding agents (claude/codex) routinely run subcommands that take .git/index.lock even for nominally read-only inspection (git status, git add -A, git stash), and two reviewers colliding on the lock surface an error. Under the DEFAULT fail-closed policy (review.fail_open=false) any single reviewer error fails the entire review step (processReviewerResults returns on first Err). Because the panel re-runs on every post-fix re-review, this is a repeated flakiness vector for opted-in repos. Worth confirming the tradeoff is acceptable or whether per-reviewer worktree isolation (or a lock-tolerant default) is wanted; the existing defense addresses corruption, not lock-contention-induced step failure.
  • ℹ️ internal/types/findings.go:346 - SeverityRank is newly exported and unit-tested but never referenced in production code. combineReviewerFindings explicitly states it does NO severity-escalation, so the intended consumer never shipped. It's harmless dead code (RiskRank, its sibling, is used); flagging only so it isn't mistaken for wired-up behavior. Either remove it or wire it into the intended severity reconciliation.
  • ⚠️ internal/config/config.go:522 - ReviewerArgs only treats non-empty args as a per-reviewer override, so args: [] still inherits agent_args_override. That makes it impossible to opt a reviewer out of a global model/flag override and can even dedup away an intended same-family reviewer because the effective args collapse to the inherited value. Consider checking whether spec.Args is nil rather than len(spec.Args) > 0 so an explicit empty list means 'no extra args'.
  • ⚠️ internal/pipeline/steps/review_panel.go:96 - Panel attribution stamps every finding with only Agent.Name(). For the supported case of two same-family reviewers with different args or path, both findings render/log as the same source such as codex, so the human gate cannot tell which model/config produced which finding. Include the stable slot or an explicit reviewer label in the source/log label when same-family reviewers are distinct.
⏭️ **Test** - skipped

Step was skipped.

✅ **Document** - passed

✅ No issues found.

✅ **Lint** - passed

✅ No issues found.

✅ **Push** - passed

✅ No issues found.

Yen Li Laih and others added 8 commits June 27, 2026 21:47
Let the review step fan out to N reviewers (mixed model families, e.g. codex +
claude) reviewing the same diff independently; their findings are merged into one
attributed union and the single configured agent reconciles and fixes. With no
review.reviewers configured the behavior is byte-identical to today.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…onal

Reviewers inspect the diff and return findings; they never write the worktree.
The shared review CWD is intentional and safe, so we do not isolate or clean up
a per-reviewer worktree. A reviewer that writes is a misconfiguration, not a
case this code defends against.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@henrylaih41

Copy link
Copy Markdown
Author

For context: #327 is the previous CR for this feature, and #356 is the improved, gated re-raise of it. The core behavior is the same — opt-in cross-family multi-reviewer panel, attributed-union merge, single-fixer reconciliation, fail-closed default, and re-review-on-fix — with the no-reviewers case staying byte-identical to today.

Core differences from #327

  • Slot-stable, collision-free IDs: review-<name>-<slot>-N via types.NormalizeFindings, so IDs don't shift when fail_open drops an earlier reviewer and a reviewer can't smuggle a colliding id.
  • Validation moved off the untrusted parse path: LoadRepo/parseRepoConfig no longer validate review; review is code-executing config taken only from the trusted default-branch copy via EffectiveRepoConfig, with LoadRepoFromBytes (trusted) and ResolveReviewers (post-trust anchor) as the two validation points.
  • Shared validation + reserved-arg re-check after auto expansion: one validateReviewerSpec helper used by both validation points; reserved-arg check re-runs after auto resolves to a concrete family.
  • Dedup on the effective reviewer: dedup key built from ReviewerPath/ReviewerArgs after fallbacks, so inherited specs collide with explicit ones resolving to the same binary.
  • Evidence fields merged: Tested, Artifacts, TestingSummary now round-trip in multi-reviewer mode.
  • RepoConfig.Review is now *ReviewRaw: distinguishes absent (inherit global panel) from explicit-empty (disable inherited panel).
  • source wired through the UI/skill: added to axi_render and the skill guidance so the fix agent and gate actually attribute by reviewer.

On the remaining warnings

Most of the flagged issues are edge cases that don't impact core correctness — e.g. same-family reviewer Source ambiguity only affects the same-family-on-different-models use case (cross-family panels are unaffected), git-lock contention is an operational-flakiness concern for opted-in repos rather than a correctness bug, ReviewerArgs len > 0 vs nil is an opt-out edge case, and SeverityRank is harmless dead code. None of them corrupt findings, mis-merge, mis-rank risk, or break the fix loop.

The reason these aren't addressed in this CR: it has been through several rounds of review and has not converged after 8 rounds of loop. We decided to open the PR now since the remaining issues are not critical and not a blocker, and can follow up on them separately.

@henrylaih41

Copy link
Copy Markdown
Author

Test step

The test step is marked skipped in this PR's pipeline run. The run executed on a fork that did not contain #310, which adds the .no-mistakes.yaml defining the test command. Without that config the test step had no command to run and fell back to the agent test-runner, whose process exited non-zero even though the underlying tests passed — a false failure rather than a genuine one.

The suite was run directly on this branch with the race detector enabled, and all tests pass:

$ go test -race ./...
ok  	github.com/kunchenguid/no-mistakes	11.199s
ok  	github.com/kunchenguid/no-mistakes/cmd/fakeagent	1.619s
?   	github.com/kunchenguid/no-mistakes/cmd/genskill	[no test files]
ok  	github.com/kunchenguid/no-mistakes/cmd/no-mistakes	1.385s
ok  	github.com/kunchenguid/no-mistakes/cmd/recordfixture	8.366s
ok  	github.com/kunchenguid/no-mistakes/internal/agent	13.106s
ok  	github.com/kunchenguid/no-mistakes/internal/bitbucket	2.100s
?   	github.com/kunchenguid/no-mistakes/internal/buildinfo	[no test files]
ok  	github.com/kunchenguid/no-mistakes/internal/cimonitor	1.840s
ok  	github.com/kunchenguid/no-mistakes/internal/cli	16.628s
ok  	github.com/kunchenguid/no-mistakes/internal/config	4.699s
ok  	github.com/kunchenguid/no-mistakes/internal/conventional	(cached)
ok  	github.com/kunchenguid/no-mistakes/internal/daemon	27.925s
ok  	github.com/kunchenguid/no-mistakes/internal/db	4.854s
?   	github.com/kunchenguid/no-mistakes/internal/e2e	[no test files]
ok  	github.com/kunchenguid/no-mistakes/internal/gate	7.817s
ok  	github.com/kunchenguid/no-mistakes/internal/git	12.251s
ok  	github.com/kunchenguid/no-mistakes/internal/intent	6.113s
ok  	github.com/kunchenguid/no-mistakes/internal/ipc	4.598s
ok  	github.com/kunchenguid/no-mistakes/internal/paths	(cached)
ok  	github.com/kunchenguid/no-mistakes/internal/pipeline	5.738s
ok  	github.com/kunchenguid/no-mistakes/internal/pipeline/steps	94.250s
ok  	github.com/kunchenguid/no-mistakes/internal/safeurl	(cached)
ok  	github.com/kunchenguid/no-mistakes/internal/scm	4.972s
ok  	github.com/kunchenguid/no-mistakes/internal/scm/github	8.210s
ok  	github.com/kunchenguid/no-mistakes/internal/scm/gitlab	7.845s
ok  	github.com/kunchenguid/no-mistakes/internal/shellenv	4.730s
ok  	github.com/kunchenguid/no-mistakes/internal/skill	4.667s
ok  	github.com/kunchenguid/no-mistakes/internal/telemetry	(cached)
ok  	github.com/kunchenguid/no-mistakes/internal/tui	2.718s
ok  	github.com/kunchenguid/no-mistakes/internal/types	(cached)
ok  	github.com/kunchenguid/no-mistakes/internal/update	2.716s
ok  	github.com/kunchenguid/no-mistakes/internal/wizard	(cached)

CI on this PR will verify this again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant