Skip to content

fix(recipe): scope mixin fallback to affected candidates#521

Merged
mchmarny merged 1 commit into
mainfrom
fix/issue-507-mixin-fallback
Apr 10, 2026
Merged

fix(recipe): scope mixin fallback to affected candidates#521
mchmarny merged 1 commit into
mainfrom
fix/issue-507-mixin-fallback

Conversation

@yuanchen8911
Copy link
Copy Markdown
Contributor

@yuanchen8911 yuanchen8911 commented Apr 9, 2026

Summary

Scope mixin fallback to only the affected candidate chains instead of resetting the entire recipe to base-only output. Add structured ExcludedOverlay type with machine-readable reason field.

Motivation / Context

When a mixin constraint fails in evaluateMixinConstraints, the previous behavior excluded all non-base overlays — including unrelated overlays (e.g., monitoring-hpa) that had nothing to do with the failing mixin. ExcludedOverlays also over-reported by including ancestors in the exclusion list.

Fixes: #507
Related: #501

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)
  • Build/CI/tooling

Component(s) Affected

  • CLI (cmd/aicr, pkg/cli)
  • API server (cmd/aicrd, pkg/api, pkg/server)
  • Recipe engine / data (pkg/recipe)
  • Bundlers (pkg/bundler, pkg/component/*)
  • Collectors / snapshotter (pkg/collector, pkg/snapshotter)
  • Validator (pkg/validator)
  • Core libraries (pkg/errors, pkg/k8s)
  • Docs/examples (docs/, examples/)
  • Other: ____________

Implementation Notes

  • buildMixinConstraintCandidateIndex maps each mixin constraint → the specific leaf candidates whose inheritance chains contribute it, so failures target specific chains rather than all applied overlays
  • Rebuild surviving spec via mergeOverlayChains so shared ancestors remain present when still needed by another surviving leaf (the old code used flat Merge which lost ancestor ordering)
  • ExcludedOverlays changed from []string to []ExcludedOverlay{Name, Reason} with two reason constants: constraint-failed (pre-merge) and mixin-constraint-failed (post-compose)
  • Backward-compatible UnmarshalYAML/UnmarshalJSON accept both legacy "string" and new {name, reason} object forms
  • evaluateMixinConstraints now returns error instead of silently proceeding when constraint mapping fails
  • ConstraintWarning.Reason format changed from free-text to "mixin-constraint-failed: expected X, got Y"
  • Verified 800 recipe+bundle combos (5 services × 5 accelerators × 2 intents × 4 OSes × 4 platforms): 0 content diffs in Helm values/deploy scripts; 35 recipes have harmless appliedOverlays ordering changes in b200/gb200-training combos due to mergeOverlayChains-based rebuild

Testing

make qualify
  • 5 new/updated test functions in metadata_store_test.go:
    • TestMixinConstraintFailureExcludesOnlyAffectedCandidateChain
    • TestMixinConstraintFailurePreservesSharedAncestorsForSurvivingLeaf
    • TestEvaluateMixinConstraintsReturnsErrorWhenConstraintCannotBeMappedToCandidate
    • Updated TestEvaluatorFailingLeafExcludesCandidate and TestMixinConstraintFailureExcludesCandidate with reason assertions
  • TestHydrateResult/"excluded overlays include reasons" in query_test.go
  • Legacy compat tests in reader_recipe_compat_test.go (YAML + JSON)

Risk Assessment

  • Low — Isolated change, well-tested, easy to revert

Rollout notes: ExcludedOverlays is now []ExcludedOverlay (object) instead of []string. Custom unmarshalers ensure backward compatibility for consumers reading stored recipes in the old format. API consumers will see the new {name, reason} format going forward.

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S) — GPG signing info

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

Coverage Report ✅

Metric Value
Coverage 74.6%
Threshold 70%
Status Pass
Coverage Badge
![Coverage](https://img.shields.io/badge/coverage-74.6%25-green)

Merging this branch will decrease overall coverage

Impacted Packages Coverage Δ 🤖
github.com/NVIDIA/aicr/pkg/recipe 90.32% (-0.08%) 👎

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/NVIDIA/aicr/pkg/recipe/metadata.go 96.52% (-0.70%) 201 (+21) 194 (+19) 7 (+2) 👎
github.com/NVIDIA/aicr/pkg/recipe/metadata_store.go 85.49% (-0.40%) 317 (-2) 271 (-3) 46 (+1) 👎

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

mchmarny

This comment was marked as resolved.

When a mixin constraint fails in evaluateMixinConstraints, only exclude
the candidate chains that contributed the failing constraint instead of
resetting the entire recipe to base-only output. Independent overlays
(e.g., monitoring-hpa) that had nothing to do with the failing mixin are
now preserved.

Key changes:
- Build a constraint-to-candidate index so failures target specific
  chains rather than all applied overlays
- Rebuild surviving spec via mergeOverlayChains so shared ancestors
  remain present when still needed by another surviving leaf
- Change ExcludedOverlays from []string to []ExcludedOverlay with a
  machine-readable Reason field (constraint-failed vs
  mixin-constraint-failed)
- Add backward-compatible UnmarshalYAML/UnmarshalJSON for the legacy
  string form of excludedOverlays
- Update OpenAPI schema and docs with the new structured fields

Fixes: #507
@yuanchen8911 yuanchen8911 force-pushed the fix/issue-507-mixin-fallback branch from b84d740 to c84ec51 Compare April 10, 2026 15:51
@yuanchen8911
Copy link
Copy Markdown
Contributor Author

Generally LGTM, it does lower the test coverage in in recipe pkg by ~1.5% so make sure there is adequate coverage

Add additional tests to reover most of the gap. The delta is -0.1%, down from the original -1.43%. The remaining 0.1% is from error paths which would require malformed input that's hard to construct naturally.

@mchmarny mchmarny merged commit a228540 into main Apr 10, 2026
24 checks passed
@mchmarny mchmarny deleted the fix/issue-507-mixin-fallback branch April 10, 2026 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: scope mixin fallback to affected candidate chain only

2 participants