Skip to content

feat(recipes): pin chart versions for NVIDIA-owned components (#748 Phase B)#777

Merged
mchmarny merged 1 commit into
mainfrom
feat/pin-nvidia-chart-versions
May 6, 2026
Merged

feat(recipes): pin chart versions for NVIDIA-owned components (#748 Phase B)#777
mchmarny merged 1 commit into
mainfrom
feat/pin-nvidia-chart-versions

Conversation

@mchmarny
Copy link
Copy Markdown
Member

@mchmarny mchmarny commented May 6, 2026

Summary

Closes #748. Implements ADR-006 Phase B: pins chart versions for the four NVIDIA-owned helm components that previously rendered upstream-latest non-deterministically. Adds a CI gate that prevents future components from landing without a chart-version pin. Updates SECURITY.md to document the new policy. Files four upstream signing requests (linked below) and references them from #739 Stage 3.

Refs #739, #740, #748. Filed alongside as upstream signing requests:

Type of Change

  • New feature (non-breaking change that adds functionality)
  • Documentation update

Component(s) Affected

  • Recipe data (recipes/registry.yaml)
  • Build/tooling (Makefile — new bom-pinning-check lint target)
  • Docs (SECURITY.md, docs/user/container-images.md)

Implementation Notes

Pinned chart versions. All four are the current upstream stable; verified our existing values.yaml shapes render against them.

Component Chart version
gpu-operator v26.3.1
network-operator 26.1.1
nvidia-dra-driver-gpu 25.12.0
nodewright-operator (skyhook chart) v0.15.1

The deployed image set is byte-identical to today's upstream-latest output — the BOM doc auto-regeneration only updates the four version columns from to their pinned values; image counts and image refs are unchanged.

bom-pinning-check CI gate. New make target invokes the BOM tool with -strict -skip-helm (no network) to fail CI when any helm component is missing defaultVersion. Wired into make lint alongside check-agents-sync and check-docs-sidebar. Future PRs adding a new helm component without defaultVersion will fail qualify automatically.

SECURITY.md. New "Deployed Image Inventory and Pinning Policy" subsection under Supply Chain Security: points at the published BOM, summarizes the three-layer ADR-006 contract, and acknowledges the upstream-signing gap with a forward reference to the supply-chain epic.

Upstream signing requests. Filed against the four NVIDIA-owned upstream repos asking for keyless cosign signatures, SLSA Build L3 provenance, and SBOM attestations. Verified beforehand that all four images today carry only legacy key-based .sig attachments without Fulcio cert / Rekor entry / SLSA predicate / SBOM. Notes:

  • NVIDIA/k8s-dra-driver-gpu redirects to kubernetes-sigs/dra-driver-nvidia-gpu (project moved to kubernetes-sigs); issue filed at the canonical maintenance location.
  • NVIDIA/skyhook redirects to NVIDIA/nodewright (rename); issue filed at the new home.

Testing

unset GITLAB_TOKEN && make qualify
# Codebase qualification completed (now includes bom-pinning-check)

make bom BOM_STRICT=1
# bom: wrote ... (22 components, 69 image refs)

docs/user/container-images.md regenerated; only the four version columns flipped from to the pinned values, no image-set delta.

Risk Assessment

  • Low — Pins to current upstream stable, which is what helm template resolves today. Image set byte-identical. New CI gate only enforces the policy ADR-006 codifies; no functional change to bundles already produced. Easy to revert.

Rollout notes: Existing bundles produced before this PR continue to deploy whatever upstream-latest resolved at their build time; bundles built after this PR deterministically deploy the pinned versions. Renovate will open chart-version bump PRs going forward (already wired under PR #737).

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint) — including the new bom-pinning-check
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality (the new lint gate covers the CI side; the BOM tool's strict mode already had unit-test coverage)
  • I updated docs if user-facing behavior changed (SECURITY.md, docs/user/container-images.md)
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

…hase B)

Closes #748. Implements the chart-version pinning policy from ADR-006
for the four NVIDIA-owned helm components that previously rendered
upstream-latest non-deterministically:

- gpu-operator             v26.3.1
- network-operator          26.1.1
- nvidia-dra-driver-gpu     25.12.0
- nodewright-operator       v0.15.1 (skyhook chart latest)

All four versions are the current upstream stable. Verified that our
existing values.yaml shapes render against them. The deployed image
set is byte-identical to today's upstream-latest output (the BOM doc
auto-regeneration only updates the four version columns from "—" to
their pinned values).

Also wires \`make bom BOM_STRICT=1\` into \`make qualify\` per ADR-006's
adoption plan: a new \`bom-pinning-check\` lint target invokes the BOM
tool with -strict to fail CI when any helm component is missing
\`defaultVersion\`. New components landing without a pin will be
rejected automatically.

SECURITY.md updated with a new "Deployed Image Inventory and Pinning
Policy" subsection that points at the published BOM, summarizes the
three-layer ADR-006 contract, and acknowledges the upstream-signing
gap. The four upstream signing requests have been filed:

- NVIDIA/gpu-operator#2432
- Mellanox/network-operator#2555
- kubernetes-sigs/dra-driver-nvidia-gpu#1105
- NVIDIA/nodewright#224

Tracking under #739 Stage 3.
@mchmarny mchmarny requested review from a team as code owners May 6, 2026 13:05
@mchmarny mchmarny self-assigned this May 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

@mchmarny mchmarny enabled auto-merge (squash) May 6, 2026 13:08
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: afe03200-c155-47ee-aee8-ba5dff6e0c46

📥 Commits

Reviewing files that changed from the base of the PR and between fb53a87 and 43b529c.

📒 Files selected for processing (4)
  • Makefile
  • SECURITY.md
  • docs/user/container-images.md
  • recipes/registry.yaml

📝 Walkthrough

Walkthrough

This pull request implements Helm chart version pinning for four components in the registry (gpu-operator, network-operator, nodewright-operator, and nvidia-dra-driver-gpu) to satisfy ADR-006 reproducibility requirements. Changes include adding a new Makefile target bom-pinning-check to verify all Helm components have pinned chart versions, wiring that target into the lint dependency chain, updating the SECURITY.md documentation with detailed image inventory and pinning governance details, and updating version references in the container images documentation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Possibly related PRs

  • NVIDIA/aicr#758: Adds Helm chart version pins by populating helm.defaultVersion entries in recipes/registry.yaml, following the same pinning pattern and addressing the same ADR-006 requirements.

Suggested reviewers

  • iamkhaledh
  • lalitadithya
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly describes the main change: pinning chart versions for NVIDIA-owned components and implementing ADR-006 Phase B as outlined in the objectives.
Description check ✅ Passed The PR description is comprehensive and directly related to the changeset, detailing the chart version pins, CI gate implementation, and security policy updates.
Linked Issues check ✅ Passed The PR successfully addresses the main coding requirements from issue #748: pins four NVIDIA-owned chart versions in recipes/registry.yaml, adds bom-pinning-check CI enforcement, and updates documentation.
Out of Scope Changes check ✅ Passed All changes are within scope of #748 Phase B: chart version pinning for NVIDIA components, CI enforcement via Makefile, and security/user documentation updates.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/pin-nvidia-chart-versions

Comment @coderabbitai help to get the list of available commands and usage tips.

@mchmarny mchmarny merged commit 45e3b8e into main May 6, 2026
94 of 96 checks passed
@mchmarny mchmarny deleted the feat/pin-nvidia-chart-versions branch May 6, 2026 13:15
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Coverage Report ✅

Metric Value
Coverage 75.1%
Threshold 70%
Status Pass
Coverage Badge
![Coverage](https://img.shields.io/badge/coverage-75.1%25-green)

No Go source files changed in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pin chart versions for all components in recipes/registry.yaml

2 participants