Skip to content

fix(ci): trigger H100 GPU tests on shared recipe changes#717

Merged
mchmarny merged 2 commits into
NVIDIA:mainfrom
yuanchen8911:fix/h100-gpu-recipe-path-filter
Apr 30, 2026
Merged

fix(ci): trigger H100 GPU tests on shared recipe changes#717
mchmarny merged 2 commits into
NVIDIA:mainfrom
yuanchen8911:fix/h100-gpu-recipe-path-filter

Conversation

@yuanchen8911
Copy link
Copy Markdown
Contributor

@yuanchen8911 yuanchen8911 commented Apr 30, 2026

Summary

Trigger the real nvkind H100 GPU workflows when shared recipe data changes.

Motivation / Context

#715 changes recipes/registry.yaml and recipes/overlays/base.yaml, but the H100 nvkind GPU workflows skipped because their path filters only matched narrower kind/H100 overlay and component paths. Shared recipe surfaces affect the rendered H100 kind bundles and should not bypass real GPU coverage.

recipes/data.go is also part of this surface because it defines the embedded recipe filesystem used by the CLI that the H100 jobs build before generating the runtime bundle.

Fixes: N/A
Related: #715

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)
  • Build/CI/tooling

Component(s) Affected

  • CLI (cmd/aicr, pkg/cli)
  • API server (cmd/aicrd, pkg/api, pkg/server)
  • Recipe engine / data (pkg/recipe)
  • Bundlers (pkg/bundler, pkg/component/*)
  • Collectors / snapshotter (pkg/collector, pkg/snapshotter)
  • Validator (pkg/validator)
  • Core libraries (pkg/errors, pkg/k8s)
  • Docs/examples (docs/, examples/)
  • Other: CI / GitHub Actions

Implementation Notes

Adds shared H100 bundle inputs to both H100 GPU workflow path filters:

  • recipes/data.go
  • recipes/registry.yaml
  • recipes/overlays/base.yaml
  • common base component value directories used by the H100 kind recipes

The filter stays focused on inputs the PR-triggered H100 kind runtime jobs actually consume. Example-only changes, recipe health-check fixtures, and non-kind OS mixins remain out of the expensive GPU gate.

Testing

actionlint .github/workflows/gpu-h100-inference-test.yaml .github/workflows/gpu-h100-training-test.yaml
git diff --check

Full make qualify skipped: this is an infra-only workflow path-filter change with no runtime code changes.

Risk Assessment

  • Low — CI-only path-filter change; easy to revert. Main impact is intentionally running H100 GPU workflows for more recipe-data changes.

Rollout notes: Merge before rebasing #715 so its push mirror picks up the corrected H100 workflow filters.

Checklist

  • Tests pass locally (make test with -race) (N/A — workflow-only CI path-filter change)
  • Linter passes (make lint) (actionlint scoped to changed workflows)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality (N/A — path-filter correction)
  • I updated docs if user-facing behavior changed (N/A)
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S) — GPG signing info

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: a8333171-c92d-4dc5-af5f-883d20287585

📥 Commits

Reviewing files that changed from the base of the PR and between ac40c0d and efc3f2a.

📒 Files selected for processing (1)
  • .github/workflows/gpu-h100-inference-test.yaml

📝 Walkthrough

Walkthrough

Two GitHub Actions workflow files were modified to expand path-based triggers. In .github/workflows/gpu-h100-inference-test.yaml the check-paths filter was extended to include additional recipe files and component directories (e.g., recipes/data.go, recipes/registry.yaml, recipes/overlays/base.yaml, and recipes/components/*/**). In .github/workflows/gpu-h100-training-test.yaml the check-paths filter was similarly expanded to include recipes/data.go, recipes/registry.yaml, recipes/overlays/base.yaml and several recipes/components/* directories (cert-manager, gpu-operator, k8s-ephemeral-storage-metrics, kai-scheduler, kube-prometheus-stack, nfd, nodewright-operator, nvidia-dra-driver-gpu, nvsentinel). No other workflow logic or exported/public entities were changed.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: expanding H100 GPU test triggers to include shared recipe changes.
Description check ✅ Passed The description is well-related to the changeset, providing clear motivation, context, implementation details, and testing approach for the path filter updates.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@yuanchen8911 yuanchen8911 force-pushed the fix/h100-gpu-recipe-path-filter branch from 2568d53 to ac40c0d Compare April 30, 2026 00:23
@yuanchen8911 yuanchen8911 requested a review from mchmarny April 30, 2026 00:29
@mchmarny mchmarny enabled auto-merge (squash) April 30, 2026 00:33
@mchmarny mchmarny merged commit 7b96afa into NVIDIA:main Apr 30, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants