Skip to content

fix(config): reject unknown YAML keys + validate agent name at load#303

Open
e-jung wants to merge 1 commit into
kunchenguid:mainfrom
e-jung:fix/config-strict-mode-and-agent-validation
Open

fix(config): reject unknown YAML keys + validate agent name at load#303
e-jung wants to merge 1 commit into
kunchenguid:mainfrom
e-jung:fix/config-strict-mode-and-agent-validation

Conversation

@e-jung

@e-jung e-jung commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes the silent-misconfiguration gap in no-mistakes' config layer. Both config loaders now reject unknown YAML keys at parse time, and the agent field is validated at load — so typos and bad values fail loudly with a config-line error instead of silently disabling features or surfacing as a generic error at run start.

Matches audit v5 §14 (unknown YAML keys silently ignored) and §15 (agent name not validated at load) — the last item in the v5 robustness ship set.

The bug

Both loaders used plain yaml.Unmarshal, the loosest decoder in gopkg.in/yaml.v3. The only field-level validation in the whole package was validateAgentArgsOverride (config.go:384). Two classes of silent misconfiguration slipped through:

  1. Unknown YAML keys silently ignored (§14). Typos like these all parsed cleanly, applied defaults, and ran as if the field weren't set — features just didn't activate, with no error anywhere:

    • agent: claud
    • citimeout: "4h" (the canonical key is ci_timeout)
    • auto_fix: { rebase: 3 } vs auto_fix: { rebas: 3 }
    • commands: { tst: "go test" } (the only field-level validation is on agent_args_override)
    • intent: { thresh: 0.2 }
  2. Agent name not validated at load (§15). ResolveAgent only probes when agent: auto; for any explicit value it returned nil immediately. isACPAgent (config.go:231) and the agent-name→binary map (config.go:214) existed but were never consulted at load. So agent: claud (typo), agent: "" (empty), agent: "acp:" (empty target), and agent: "acp: foo" (whitespace) all parsed cleanly. The user only found out when a run started — through a generic error from agent.NewWithOptions, not a config-line error.

This amplifies every other typo class: a config that looks correct stays silent, and the resulting "feature didn't activate" symptoms are hard to attribute.

The fix

Strict decoding. Both LoadGlobal and LoadRepo now decode via yaml.NewDecoder(r).KnownFields(true). Unknown keys produce a parse error of the form parse global config: field X not found in type globalConfigRaw, with the file path and field name. The legacy babysit_timeout / auto_fix.babysit aliases keep working because they're declared fields on globalConfigRaw / AutoFixRaw.

Agent validation at load. New validateAgentName helper encodes the accepted set:

  • auto, claude, codex, rovodev, opencode, pi (via the existing defaultBinary map).
  • acp:<target> for any non-empty, whitespace-free target (via the existing isACPAgent).
  • Empty is allowed (means default/inherit).

LoadGlobal calls validateGlobalAgent after the strict decode: it rejects typo/unknown values, and also rejects an explicit agent: "" (distinguished from an absent agent: key via a one-shot map[string]any probe) so a user who explicitly cleared the field gets a clear error rather than a silent fallback to auto. LoadRepo calls validateRepoAgent, which rejects typos but keeps empty-as-inherit (the documented semantics — repo agent inherits from global).

Errors are wrapped with the file path and the agent field name, e.g. global config /home/user/.no-mistakes/config.yaml: agent "claud" is not recognized (valid: auto, claude, codex, rovodev, opencode, pi, or acp:<target>).

Verification

gofmt -l . clean · go vet ./... clean · go build ./... OK · go test -race ./... green (Go 1.26.4 — system Go 1.18 is too old).

New tests in internal/config/config_strict_test.go:

  • Typo rejection: TestLoadGlobal_RejectsUnknownTopLevelKey, TestLoadGlobal_RejectsUnknownNestedKey, TestLoadRepo_RejectsUnknownTopLevelKey, TestLoadRepo_RejectsUnknownNestedKey — confirm citimeout: "4h", commands: { tst: ... }, auto_fix: { rebse: 3 }, intent: { thresh: 0.2 }, test: { evidence: { dirr: foo } } all fail to load with a parse error (was: silently accepted).
  • Agent validation: TestLoadGlobal_RejectsAgentTypo, TestLoadGlobal_RejectsAgentEmpty, TestLoadGlobal_RejectsAgentBareEmpty, TestLoadGlobal_RejectsACPEmptyTarget, TestLoadGlobal_RejectsACPWhitespaceTarget, TestLoadRepo_RejectsAgentTypo, TestLoadRepo_RejectsACPEmptyTarget.
  • No regression: TestLoadGlobal_AcceptsValidAgents (all 5 native agents + auto + acp:gemini), TestLoadGlobal_AcceptsAllDocumentedKeys, TestLoadRepo_AcceptsAllDocumentedKeys (exercises legacy babysit_timeout / auto_fix.babysit aliases), TestLoadGlobal_MissingAgentKeyUsesDefault, TestLoadRepo_AcceptsEmptyAgentAsInherit (absent + explicit empty), TestValidateAgentName (table-driven unit test on the helper).
  • Existing config tests (TestLoadGlobal_*, TestLoadRepo_*, TestLoadGlobal_ACPConfig, TestDefaultConfigYAML_MatchesGoDefaults, TestLoadGlobal_Legacy*) continue to pass unchanged, confirming the strict decoder doesn't reject any documented field.

Notes for review

  • The default config template (defaultConfigYAML) and the e2e harness's generated configs only ever used known keys, so strict mode is a no-op on existing valid configs. The only behavior change is for configs that were previously silently broken.
  • I went with the strict-decoder path (rather than the warning-on-diff fallback the audit mentions as an alternative) because the package already had a KnownFields(true)-friendly schema (every legacy alias is a declared struct field), so there's no migration surface. If this breaks a user config in the wild that I missed, the fix is mechanical: declare the field on globalConfigRaw / RepoConfig.
  • I made repo-config agent: "" (explicit empty) tolerated rather than rejected, because the documented semantics are "empty = inherit from global." Global config's agent: "" is rejected because there's no parent to inherit from — the silent fallback to auto is the bug §15 calls out.
  • AGENTS.md gains one line recording the strict-decoding convention so future contributors know to declare new YAML fields on the raw struct (or they'll be rejected as unknown).

AI disclosure: Human-reviewed. The change was produced with AI assistance and reviewed by a human contributor before submission.

Both config loaders (config.LoadGlobal and config.LoadRepo) used plain
yaml.Unmarshal, so typos like 'agent: claud', 'citimeout: "4h"',
'commands: { tst: ... }', or 'intent: { thresh: 0.2 }' parsed cleanly
and silently disabled features. Separately, ResolveAgent only probed
for 'agent: auto'; any explicit value was returned as-is, so
'agent: claud', 'agent: "acp:"', and 'agent: "acp: foo"' failed late
with a generic error at run start instead of at config load.

Switch both loaders to yaml.NewDecoder(r).KnownFields(true) so unknown
keys are rejected with a clear parse error naming the bad field. Add a
validate step at the end of each load that calls isACPAgent for acp:
values and the known-agent set for native names, rejecting empty (when
explicitly set on global), typo, and unknown values with file path +
field name. Repo config keeps empty-as-inherit semantics.

Adds table-driven tests covering: typo'd top-level and nested keys,
agent typos, explicit-empty agent, acp: edge cases, and confirmation
that all documented keys (including legacy babysit_* aliases) still
load cleanly.

Audit v5 section 14 (unknown YAML keys silently ignored) and section 15
(agent name not validated at load).
@kunchenguid

Copy link
Copy Markdown
Owner

Thanks for the contribution, @e-jung! One process note on this repo: PRs need to be raised through no-mistakes (git push no-mistakes), which runs the review/test/lint/CI pipeline and stamps the required signature into the PR body. This PR is currently failing the "PR must be raised via no-mistakes" check because it wasn't raised that way.

If working from a fork was the blocker, that's fixed as of v1.30.0 (#306). Per CONTRIBUTING.md: point origin at this parent repo, run no-mistakes init --fork-url <your-fork-url>, then git push no-mistakes.

I won't be merging PRs that bypass the gate going forward, but I'd genuinely love to land your work once it's re-raised. Thanks for understanding! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants