-
Notifications
You must be signed in to change notification settings - Fork 161
Description
Description
Plan mode works well to take a problem and define a complete solution, which sometimes involve touching a large amount of code. If the LLM goes on a misguided tangent early, the only way to stop it is to interrupt the agent manually, which implies looking over its shoulder all the time.
Asking the agent to interrupt itself between units of work doesn't work. It will add things like this to the plan:
> Review Gate: stop and request user review before starting Unit 2.
> Execution rule: do not start the next unit until user reviews the previous unit’s diff + verification output.but will completely ignore these "review gates" and keep going until the plan is finished.
It would be useful to have a way to enforce those checkpoints.
Use Case
Described above. more collaborative workflow that prevents the llm going on wild goose chases and dumping a lot of code on the user to review. Vibe coding makes sense for one-off scripts but when you need to understand what the LLM writes, it's better to have more control over when reviews make sense.
Area
Other
Proposed Solution
The way I'd see it is to have a "execute in supervised mode" as one of the choices when the plan is complete, that would insert the checkpoints before executing. Or even another mode /supervised-plan that would insert those checkpoints on first pass.
Alternatives Considered
tried to instruct the agent but those instructions are ignored due to plan mode prompts:
agent explanation for what happened:
Yes — I did see conflicting directives, and I chose the wrong one for your intended
workflow.What conflicted:
- Plan-level checkpoints said:
- stop after each unit
- request your review before continuing
- Higher-level execution directives in-session also said:
- “Plan approved. You MUST execute it now.”
- “You MUST keep going until complete.”
- “You MUST NOT ask for confirmation to continue work.”
- “You MUST NOT yield incomplete work.”
Given that set, I interpreted the run mode as “continuous execution to completion,” which
overrode the staged review gates in practice.