Skip to content

fix(git): headless-friendly signing + fail-fast on hung hooks/SSH#302

Open
e-jung wants to merge 1 commit into
kunchenguid:mainfrom
e-jung:fix/signing-friendly-and-no-hang-hooks
Open

fix(git): headless-friendly signing + fail-fast on hung hooks/SSH#302
e-jung wants to merge 1 commit into
kunchenguid:mainfrom
e-jung:fix/signing-friendly-and-no-hang-hooks

Conversation

@e-jung

@e-jung e-jung commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Summary

Makes no-mistakes' autofix git operations headless-safe and fail-fast, so two real-world hang/failure modes stop wedging the pipeline until the run's CI timeout:

  1. Commit signing fails headlessly (audit v5 §4, HIGH for enterprise). All autofix commits were plain git commit -m. CopyLocalUserIdentity copies only user.name/user.email, so commit.gpgsign=true (mandated by many enterprise policies) stays inherited. Every autofix commit then attempted to sign and failed with gpg: signing failed: Inappropriate ioctl for device or hung to pinentry-timeout. The e2e harness already works around this (harness.go: git config commit.gpgsign false), confirming the failure exists in the wild.
  2. Hooks / SSH passphrase prompts hang the pipeline (audit v5 §13, MED). NonInteractiveEnv set GIT_TERMINAL_PROMPT=0 but not SSH_ASKPASS_REQUIRE=never, so an SSH passphrase prompt (passphrased key, no agent) opened /dev/tty and blocked for hours. Interactive user-installed hooks (confirmations, 2FA) inherited via the worktree's common dir hung the same way.

The bug

Autofix commits and pushes lived at three sites, each calling bare git commit/git push under the run's long-lived context with no headless safeguards beyond GIT_TERMINAL_PROMPT:

  • internal/pipeline/steps/common_fix.gocommitAgentFixes (review/test/lint/document fix): git.Run(..., "commit", "-m", msg).
  • internal/pipeline/steps/push.goPushStep: git.Run(..., "commit", ...) + git.Push(...).
  • internal/pipeline/steps/ci_fix.goCIStep.commitAndPush/pushUpdatedHeadSHA: stepGitRun(..., "commit", ...) + stepGitPush(...).

None disabled signing, none imposed a per-op deadline, and the CI path (stepGitRun) didn't even apply NonInteractiveEnv in production (it relied on the parent process env). So a signing policy or a single blocking hook/SSH prompt stalled the whole run.

The fix

Option (a) — the small, ships-fast path recommended in the audit. All three autofix sites now funnel through two new helpers in internal/pipeline/steps/autofix.go:

  • autofixCommit(sctx, message) — runs git -c commit.gpgsign=false commit -m <msg>, and when git.CommitSigningEnabled resolves true, logs a one-time warning that the autofix will be unsigned. Signing is the correct behavior to skip for machine-generated commits, and it's the only thing that works headlessly.
  • autofixPush(sctx, remote, ref, expectedSHA, forceWithLease) — runs the force-with-lease push with the same safeguards as the commit.

Both helpers:

  • Derive a short context.WithTimeout(sctx.Ctx, autofixGitTimeout) (default 2m, far below any CI timeout) that still inherits run cancellation, so a hung pre-commit/pre-push hook or SSH prompt fails the step in seconds instead of hours.
  • Always apply git.NonInteractiveEnv (merged under sctx.Env so tests can still inject a fake git), closing the latent gap where the CI-fix path ran without headless env.

Supporting changes:

  • git.NonInteractiveEnv now sets SSH_ASKPASS_REQUIRE=never so ssh fails fast instead of blocking on a passphrase prompt. The happy path (agent holds the key) is unaffected.
  • git.CommitSigningEnabled(ctx, dir) reads the effective commit.gpgsign (local/global/system/includes via --type=bool); returns false on any error including the key being unset.
  • common_exec.go: stepCmd/stepGitRun gained explicit-context variants (stepCmdCtx/stepGitRunCtx) so the autofix helpers can scope a shorter deadline. The now-unused stepGitPush wrapper was removed.

The e2e harness's own git config commit.gpgsign false is left in place — it covers the harness's direct setup commits, and is now redundant (not wrong) for the autofix path.

Verification

gofmt -l . clean · go vet ./... clean · go run ./cmd/genskill --check up to date · go build ./... OK · go test -race ./... green (Go 1.26.4 — system Go 1.18 is too old).

New tests pinning each fix (the defining symptoms are a hang and a headless signing failure, so these exist to catch regressions):

  • TestCommitAgentFixes_GpgSignEnabledCommitsUnsignedWithWarning (internal/pipeline/steps/autofix_test.go) — with commit.gpgsign=true set, an autofix commit lands successfully, emits the unsigned warning, and the resulting commit is verified unsigned (git log -1 --pretty=%G?N); HEAD advances and is persisted.
  • TestCommitAgentFixes_HungHookFailsFast (internal/pipeline/steps/autofix_test.go) — a commit that would hang (a fake git binary that blocks forever on the commit subcommand) fails within the scoped timeout (~300ms) rather than blocking; HEAD does not advance. Uses the existing fake-CLI harness so the timeout cancellation kills the direct child with no orphaned subprocess.
  • TestNonInteractiveEnv_SetsGitOverrides / _OverridesAmbientEditor (internal/git/env_test.go) — SSH_ASKPASS_REQUIRE=never is present and overrides ambient values.
  • TestCommitSigningEnabled (internal/git/git_test.go) — unset → false, local true → true, explicit false → false.

Existing commitAndPush / push-step tests (including the fake-git sctx.Env injection tests) continue to pass unchanged, confirming the refactor is behavior-preserving on the happy path.

Notes for review

  • The 2-minute default ceiling on autofix commit/push is a balance: generous for a legit pre-commit/pre-push hook on a large repo, but far below the run's CI timeout so a genuinely stuck op fails fast. It's a package-level var (not a const) purely so the fail-fast test can shrink it; happy-path commits finish in well under the ceiling. If a maintainer prefers a different value or separate commit vs. push ceilings, it's a one-line change.
  • I went with option (a) (sign-out the autofix) rather than option (b) (copy signing config + forward GPG_TTY/SSH_AUTH_SOCK) deliberately: autofix commits are machine-generated, can't be signed headlessly in the general case, and (a) is the small ships-fast path the audit recommends. If the project ever wants signed autofix commits, (b) is an additive follow-up on top of this.
  • This change touches push.go and ci_fix.go, which other open fork PRs also touch; the fixes are orthogonal to credential redaction (fix(security): redact embedded credentials from upstream URLs in logs and errors #299) and fork→parent routing (fix(scm): no-mistakes cannot ship fork-based contributions — push and PR share one upstream_url, no fork→parent routing #293). No coordination needed for this PR to be correct on current main.
  • AGENTS.md gains one line recording the convention that autofix commit/push must stay headless-safe (route through autofixCommit/autofixPush), to prevent future regressions to bare git.Run/git.Push.

AI disclosure: Human-reviewed. The change was produced with AI assistance and reviewed by a human contributor before submission.

Audit v5 §4 (commit signing, HIGH for enterprise) + §13 (hooks/SSH hang, MED).

Autofix commits were plain `git commit -m`, so orgs that mandate
`commit.gpgsign=true` could not autofix headlessly: every commit attempted
to sign and failed with 'gpg: signing failed: Inappropriate ioctl for device'
or hung to pinentry-timeout. Separately, an interactive pre-commit/pre-push
hook or an SSH passphrase prompt (passphrased key, no agent) opened /dev/tty
and blocked the pipeline until CITimeout.

Fix (option a, the small ships-fast path):

- Signing: route every autofix commit through `git -c commit.gpgsign=false`
  and warn once when signing was effectively enabled, so the autofix lands
  unsigned instead of wedging. Detected via new git.CommitSigningEnabled.
- SSH: add SSH_ASKPASS_REQUIRE=never to git.NonInteractiveEnv so ssh fails
  fast instead of blocking on a passphrase prompt.
- Timeout: wrap every autofix git commit/push in a short context.WithTimeout
  (independent of the run's parent ctx) so a hung hook fails the step in
  seconds rather than hours.

All three autofix sites funnel through new steps.autofixCommit/autofixPush
(common_fix.go, push.go, ci_fix.go), which also guarantee NonInteractiveEnv
applies in production (the CI-fix path previously ran without it).
@kunchenguid

Copy link
Copy Markdown
Owner

Thanks for the contribution, @e-jung! One process note on this repo: PRs need to be raised through no-mistakes (git push no-mistakes), which runs the review/test/lint/CI pipeline and stamps the required signature into the PR body. This PR is currently failing the "PR must be raised via no-mistakes" check because it wasn't raised that way.

If working from a fork was the blocker, that's fixed as of v1.30.0 (#306). Per CONTRIBUTING.md: point origin at this parent repo, run no-mistakes init --fork-url <your-fork-url>, then git push no-mistakes.

I won't be merging PRs that bypass the gate going forward, but I'd genuinely love to land your work once it's re-raised. Thanks for understanding! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants