fix: address top-7 code-review findings across packages#721
Merged
Conversation
Apply seven hardening items identified in a cross-package review: 1. Evidence collector — aggregate per-section errors via stderrors.Join, bound bash subprocess with context.WithTimeout, capture stdout/stderr into bounded buffers and emit via slog, probe bash/kubectl in PATH once at start, add WithNoCluster() short-circuit for test isolation, capture os.RemoveAll cleanup error, and ctx-check renderer index step. 2. HTTP handlers — wrap POST bodies with http.MaxBytesReader (8 MiB bundle, 1 MiB recipe), translate *http.MaxBytesError to 413 with structured error code, and tag every handler slog call with requestID via new server.RequestIDFromContext accessor. 3. OCI push — add per-attempt context.WithTimeout, bounded retry with exponential backoff + jitter on transient errors, slog.Warn when InsecureTLS=true, base transport on defaults.NewHTTPTransport(), and set explicit oras.CopyOptions.Concurrency. 4. K8s pod helpers — extract pod.GetPodForJob, pod.WaitForTermination (with one-shot watch retry), and pod.WaitForJobTerminal (Failed-as- completion mode). Refactor pkg/validator/job to delegate. Wrap bare returns in EnsureRBAC and replace stringified CleanupRBAC join with stderrors.Join + WrapWithContext. 5. Build spec — make WriteBack atomic via temp+fsync+rename, cap LoadSpec input at MaxSpecFileBytes (1 MiB), add KnownFields(true) to YAML decode, and replace os.IsNotExist with errors.Is(fs.ErrNotExist). 6. Folded into items 1 and 2. 7. K8s collector — parallelize sub-collectors via errgroup, paginate collectContainerImages at K8sPodListPageSize=500, and cache the dynamic client via sync.Once. Tests added/updated for each change. golangci-lint passes 0 issues across all modified packages; coverage on changed scope is 77.3% (above the 70% project floor). Pre-existing unrelated sandbox-only test failures in pkg/bundler/deployer/helm and pkg/trust are not touched by this PR.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Apply 12 inline review comments: - pkg/build/spec.go: use os.CreateTemp for unique per-write tmp file so concurrent WriteBack callers cannot clobber each other; fsync the parent directory after rename for crash-safety; add a renameFn package seam so tests can deterministically exercise the rename failure path. Drop redundant chmod (CreateTemp opens with 0o600 by default). - pkg/build/spec_test.go: rename misleading test from RenameFailurePreservesOriginal -> TempCreateFailurePreservesOriginal to reflect what it actually exercises (read-only directory blocks CreateTemp). Add real RenameFailurePreservesOriginal that uses the renameFn seam. Update glob checks for the new ".tmp-*" pattern. - pkg/k8s/pod/job.go: in WaitForJobCompletion and WaitForJobTerminal, preserve ErrCodeTimeout classification when the watch channel closes after the deadline (previously surfaced as ErrCodeInternal). Handle watch.Error events explicitly with ErrCodeInternal instead of silently continuing on the type-assert miss. - pkg/evidence/collector.go: check ctx.Err() before each section dispatch so cancellation stops creating temp work after the caller aborts. Surface section subprocess timeouts as ErrCodeTimeout (was ErrCodeInternal). Aggregate-level error preserves ErrCodeTimeout when any section reported a timeout. - pkg/validator/job/deployer.go: WaitForCompletion now propagates pod.WaitForJobTerminal errors as-is so structured codes (ErrCodeTimeout, ErrCodeUnavailable) survive for retry classification. WaitForPodTermination only swallows ErrCodeNotFound; other errors (RBAC, transient API failures) propagate so cleanup can race-aware. - pkg/oci/push.go: clone existing TLSClientConfig (or allocate fresh) and only mutate InsecureSkipVerify, preserving any future hardening defaults from defaults.NewHTTPTransport (MinVersion, cipher suites). - pkg/bundler/handler_test.go, pkg/recipe/handler_test.go, pkg/recipe/handler_query_test.go: tighten 413 assertions to verify exact limit_bytes value matches defaults.MaxBundlePOSTBytes / MaxRecipePOSTBytes. Use a valid JSON envelope for the recipe and query tests so MaxBytesReader detection is deterministic rather than coupled to early JSON syntax errors. Tests added for the new error-handling paths: - WaitForJobCompletion_WatchError, _WatchClosedAfterTimeout - WaitForJobTerminal_WatchError, _WatchClosedAfterTimeout - evidence: TestRunPreservesTimeoutCode, TestRunStopsOnContextCancellation - validator/job: TestWaitForPodTerminationPropagatesNonNotFound golangci-lint passes 0 issues across all modified packages.
CI's pinned golangci-lint v2.10.1 with gosec flags os.Remove(tmp) on the WriteBack failure-cleanup paths because tmp's filename derives from the path argument via os.CreateTemp. The tmp variable is the return value of f.Name() from a CreateTemp call earlier in the same function; removing it is the inverse of that creation, not a tainted-path op. Reproduced with the pinned linter version (local v2.11.4 didn't flag).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Applies seven hardening items identified in a cross-package code review covering error handling, ingress safety, retry/backoff for OCI push, atomic file writes, K8s helper extraction, and collector parallelism.
Motivation / Context
A cross-package review surfaced consistent gaps: unbounded HTTP POST bodies, an
os/execbash collector lacking project conventions, a non-atomic state file write, missing retries on registry pushes, duplicated K8s utilities that bypassedpkg/k8s/pod, sequential sub-collectors that could parallelize, and handler logs missing request-ID correlation. This PR addresses all seven prioritized findings.Fixes: N/A
Related: N/A
Type of Change
Component(s) Affected
cmd/aicrd,pkg/api,pkg/server)pkg/bundler,pkg/component/*)pkg/collector,pkg/snapshotter)pkg/validator)pkg/errors,pkg/k8s)pkg/build,pkg/oci,pkg/evidence,pkg/recipe,pkg/defaultsImplementation Notes
stderrors.Join, bound bash subprocess withcontext.WithTimeout, capture stdout/stderr into bounded buffers and emit viaslog, probebash/kubectlonce at start, addWithNoCluster()short-circuit, captureos.RemoveAllcleanup error, ctx-check renderer index step.http.MaxBytesReaderon bundler (8 MiB) and recipe (1 MiB) POSTs;*http.MaxBytesError→ 413 withErrCodeInvalidRequest. Newserver.RequestIDFromContextaccessor; per-request slog logger withrequestIDin all three handlers.RegistryPushTimeout=10m, 3-attempt retry with exponential backoff + jitter on transient errors only (5xx/429/timeouts),slog.WarnonInsecureTLS=true, transport fromdefaults.NewHTTPTransport(), explicitoras.CopyOptions.Concurrency=3.pod.GetPodForJob,pod.WaitForTermination(with one-shot watch retry),pod.WaitForJobTerminal(Failed-as-completion mode).pkg/validator/jobrefactored to delegate; bare returns inEnsureRBACwrapped;CleanupRBACswitched from string-joined error tostderrors.Join+WrapWithContext(preserves underlying error codes forerrors.Is/As).WriteBackvia temp+fsync+rename,MaxSpecFileBytes=1MiBcap onLoadSpec,KnownFields(true)YAML decoder,errors.Is(fs.ErrNotExist)replacingos.IsNotExist.errgroup.WithContext,collectContainerImagespaginated atK8sPodListPageSize=500, dynamic client cached viasync.Once.Behavior change:
Deployer.WaitForPodTerminationnow returnserror(was no return). Single in-tree caller updated.Testing
golangci-lint run -c .golangci.yamlon all modified packages: 0 issues.pkg/evidence59.4% → 78.1%,pkg/k8s/pod81.6%,pkg/server94.0%,pkg/oci77.6%,pkg/recipe90.7%,pkg/validator/job83.8%,pkg/build(with new tests for atomic write, oversize file, unknown YAML field).pkg/bundler/deployer/helm(mktemp on/var/folders) andpkg/trust(writes to~/.sigstore) are untouched by this PR.Risk Assessment
Rollout notes:
Deployer.WaitForPodTerminationnow returnserror. Internal-only; sole caller (pkg/validator/validator.go) updated to log a warning on the new return value.pkg/defaultsconstants (MaxBundlePOSTBytes,MaxRecipePOSTBytes,RegistryPushTimeout,RegistryPushRetries,RegistryPushBackoff,OCIPushConcurrency,EvidenceSectionTimeout,EvidenceMaxOutputBytes,K8sPodListPageSize,MaxSpecFileBytes,SpecFileMode).Checklist
make testwith-race)make lint)git commit -S)