feat(recipes): pin chart versions for NVIDIA-owned components (#748 Phase B)#777
Conversation
…hase B) Closes #748. Implements the chart-version pinning policy from ADR-006 for the four NVIDIA-owned helm components that previously rendered upstream-latest non-deterministically: - gpu-operator v26.3.1 - network-operator 26.1.1 - nvidia-dra-driver-gpu 25.12.0 - nodewright-operator v0.15.1 (skyhook chart latest) All four versions are the current upstream stable. Verified that our existing values.yaml shapes render against them. The deployed image set is byte-identical to today's upstream-latest output (the BOM doc auto-regeneration only updates the four version columns from "—" to their pinned values). Also wires \`make bom BOM_STRICT=1\` into \`make qualify\` per ADR-006's adoption plan: a new \`bom-pinning-check\` lint target invokes the BOM tool with -strict to fail CI when any helm component is missing \`defaultVersion\`. New components landing without a pin will be rejected automatically. SECURITY.md updated with a new "Deployed Image Inventory and Pinning Policy" subsection that points at the published BOM, summarizes the three-layer ADR-006 contract, and acknowledges the upstream-signing gap. The four upstream signing requests have been filed: - NVIDIA/gpu-operator#2432 - Mellanox/network-operator#2555 - kubernetes-sigs/dra-driver-nvidia-gpu#1105 - NVIDIA/nodewright#224 Tracking under #739 Stage 3.
|
🌿 Preview your docs: https://nvidia-preview-feat-pin-nvidia-chart-versions.docs.buildwithfern.com/aicr |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis pull request implements Helm chart version pinning for four components in the registry (gpu-operator, network-operator, nodewright-operator, and nvidia-dra-driver-gpu) to satisfy ADR-006 reproducibility requirements. Changes include adding a new Makefile target Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Coverage Report ✅
Coverage BadgeNo Go source files changed in this PR. |
Summary
Closes #748. Implements ADR-006 Phase B: pins chart versions for the four NVIDIA-owned helm components that previously rendered upstream-latest non-deterministically. Adds a CI gate that prevents future components from landing without a chart-version pin. Updates
SECURITY.mdto document the new policy. Files four upstream signing requests (linked below) and references them from #739 Stage 3.Refs #739, #740, #748. Filed alongside as upstream signing requests:
Type of Change
Component(s) Affected
recipes/registry.yaml)Makefile— newbom-pinning-checklint target)SECURITY.md,docs/user/container-images.md)Implementation Notes
Pinned chart versions. All four are the current upstream stable; verified our existing
values.yamlshapes render against them.v26.3.126.1.125.12.0v0.15.1The deployed image set is byte-identical to today's upstream-latest output — the BOM doc auto-regeneration only updates the four version columns from
—to their pinned values; image counts and image refs are unchanged.bom-pinning-checkCI gate. Newmaketarget invokes the BOM tool with-strict -skip-helm(no network) to fail CI when any helm component is missingdefaultVersion. Wired intomake lintalongsidecheck-agents-syncandcheck-docs-sidebar. Future PRs adding a new helm component withoutdefaultVersionwill fail qualify automatically.SECURITY.md. New "Deployed Image Inventory and Pinning Policy" subsection under Supply Chain Security: points at the published BOM, summarizes the three-layer ADR-006 contract, and acknowledges the upstream-signing gap with a forward reference to the supply-chain epic.
Upstream signing requests. Filed against the four NVIDIA-owned upstream repos asking for keyless cosign signatures, SLSA Build L3 provenance, and SBOM attestations. Verified beforehand that all four images today carry only legacy key-based
.sigattachments without Fulcio cert / Rekor entry / SLSA predicate / SBOM. Notes:NVIDIA/k8s-dra-driver-gpuredirects tokubernetes-sigs/dra-driver-nvidia-gpu(project moved to kubernetes-sigs); issue filed at the canonical maintenance location.NVIDIA/skyhookredirects toNVIDIA/nodewright(rename); issue filed at the new home.Testing
docs/user/container-images.mdregenerated; only the four version columns flipped from—to the pinned values, no image-set delta.Risk Assessment
helm templateresolves today. Image set byte-identical. New CI gate only enforces the policy ADR-006 codifies; no functional change to bundles already produced. Easy to revert.Rollout notes: Existing bundles produced before this PR continue to deploy whatever upstream-latest resolved at their build time; bundles built after this PR deterministically deploy the pinned versions. Renovate will open chart-version bump PRs going forward (already wired under PR #737).
Checklist
make testwith-race)make lint) — including the newbom-pinning-checkSECURITY.md,docs/user/container-images.md)git commit -S)