Summary
Add a per-recipe stability / maturity signal so consumers can distinguish vetted recipes from preview or experimental ones. Today the only maturity signal is the shared apiVersion: aicr.nvidia.com/v1alpha1, which is schema-wide and uniform; there is no way for a consumer to tell a well-exercised h100-eks-ubuntu-inference-dynamo apart from a placeholder overlay staged ahead of a new SKU's GA or a recipe that is a work-in-progress.
Motivation
In a scenario where AICR is employed for platform-conformance validation (for K8s and above elements) of a regularly evolving full-stack reference architecture (applicable to both well-established platforms and those still nascent and in-development):
- Forward-looking SKU recipes. To add DGX platforms to the recipe matrix (B300, GB300, and later Vera Rubin class) we want to stage overlays alongside new
accelerator enum values before those SKUs are broadly deployed or even finalized. A consumer running aicr recipe --accelerator vr200 --service dgx-superpod --intent training should get a clear signal that the returned recipe is a scaffold, not yet a battle-tested spec.
- Experimental recipe variants. Recipes exploring a new platform stack (for example, a variant of
-inference- with an alternative gateway) benefit from being discoverable in the library without being quietly selected by criteria-only matching.
- Deprecation. When a recipe is superseded (e.g. rolled into a mixin, or replaced by a newer variant) I don't believe there's a way to mark it (in schema) as deprecated while still shipping it for a transition period.
More broadly: as the recipe library grows past the current ~30 overlays and begins to include contributions from multiple organizations and community members, a lifecycle indicator is useful for maintainers as well as consumers.
Proposal
Add two optional fields to RecipeMetadataSpec (pkg/recipe/metadata.go):
spec:
stability: preview # one of: stable | preview | experimental | deprecated
stabilityNote: "Placeholder for VR200 SKU; enum added ahead of GA. No cluster validation yet."
Enum semantics
| Value |
Meaning |
stable |
Default when the field is absent. Recipe has been exercised on at least one representative cluster and is intended for general consumption. |
preview |
Recipe is complete and internally consistent but has not yet accumulated real-world validation evidence; consumers should expect iteration. |
experimental |
Recipe is intentionally exploratory (alternate stack choices, partial coverage, investigative). Not a candidate for promotion to stable on its current trajectory. |
deprecated |
Recipe is being phased out. stabilityNote should point to the replacement. |
Compatibility
- Both fields are
omitempty. Missing stability means stable — existing recipes and overlays continue to work unchanged.
- No bump to
apiVersion required; the change is additive on v1alpha1.
- Validation: unknown values fail parsing with a clear error (parallel to how
ParseCriteriaServiceType handles unknown service types today).
CLI surface
Two minimal CLI additions leverage the new field:
aicr recipe --stability stable — filter candidate overlays by stability during matching. Default behavior omits experimental and deprecated unless explicitly requested.
aicr recipe list output (once that subcommand lands; currently under consideration in the roadmap) surfaces the stability column alongside name and criteria.
Validators (aicr validate --phase ...) do not filter by stability — they run whatever recipe they are pointed at. The filter is a recipe-selection concern, not a validation one.
Evidence / provenance (optional, non-blocking)
A follow-on consideration, not required in the initial change, is to let stabilityNote carry a structured pointer to validation evidence (a SHA, a PR URL, or a path to a VALIDATION-EVIDENCE.md-style matrix entry). That keeps the feature human-readable while leaving space for a tighter schema later.
Example
kind: RecipeMetadata
apiVersion: aicr.nvidia.com/v1alpha1
metadata:
name: vr200-dgx-superpod-ubuntu-training-runai
spec:
base: base
stability: preview
stabilityNote: >-
Scaffold for VR200-class DGX SuperPOD deployments. Accelerator enum
and topology assertions staged ahead of SKU GA; not yet exercised on
hardware. Graduates to stable after the first cluster run captures
evidence in the downstream validation matrix.
criteria:
service: dgx-superpod
accelerator: vr200
intent: training
os: ubuntu
platform: runai
Alternatives considered
- Naming convention only (
preview-<name>.yaml or recipes/overlays/preview/). Works, but relies entirely on reviewer vigilance and breaks down once recipe names become semantically meaningful on their own. It also makes graduation from preview to stable a filesystem rename rather than a metadata edit, which is noisier.
- Kubernetes-style annotations. A generic
metadata.annotations block on RecipeMetadata would be flexible but under-specifies the common case (the stability signal) and pushes the contract into a string-typed map. A dedicated enum is clearer for this specific, well-understood concern; a future general-purpose annotations block could still coexist.
- Do nothing and rely on
v1alpha1. Keeps the schema flat but does not scale as the recipe library grows to include pre-GA SKU placeholders and community contributions with varying maturity.
Scope of this issue
This issue scopes the schema change and the minimum-viable CLI consumption of it. Follow-on issues could cover:
aicr recipe list (dependent on list subcommand landing).
- Stability-aware test matrix (which stability levels are exercised in CI).
- Structured
stabilityNote schema (pointer / URI / evidence manifest).
Happy to open a PR against this issue once the proposed direction is confirmed, or to refine the proposal in the comments.
Context
Surfaced in the course of planning AICR integration into a downstream validation framework that explores contributing DGX BasePOD / SuperPOD recipes upstream across current and forward-looking SKU families (B200, B300, GB200, GB300, and later Vera Rubin).
Summary
Add a per-recipe stability / maturity signal so consumers can distinguish vetted recipes from preview or experimental ones. Today the only maturity signal is the shared
apiVersion: aicr.nvidia.com/v1alpha1, which is schema-wide and uniform; there is no way for a consumer to tell a well-exercisedh100-eks-ubuntu-inference-dynamoapart from a placeholder overlay staged ahead of a new SKU's GA or a recipe that is a work-in-progress.Motivation
In a scenario where AICR is employed for platform-conformance validation (for K8s and above elements) of a regularly evolving full-stack reference architecture (applicable to both well-established platforms and those still nascent and in-development):
acceleratorenum values before those SKUs are broadly deployed or even finalized. A consumer runningaicr recipe --accelerator vr200 --service dgx-superpod --intent trainingshould get a clear signal that the returned recipe is a scaffold, not yet a battle-tested spec.-inference-with an alternative gateway) benefit from being discoverable in the library without being quietly selected by criteria-only matching.More broadly: as the recipe library grows past the current ~30 overlays and begins to include contributions from multiple organizations and community members, a lifecycle indicator is useful for maintainers as well as consumers.
Proposal
Add two optional fields to
RecipeMetadataSpec(pkg/recipe/metadata.go):Enum semantics
stablepreviewexperimentalstableon its current trajectory.deprecatedstabilityNoteshould point to the replacement.Compatibility
omitempty. Missingstabilitymeansstable— existing recipes and overlays continue to work unchanged.apiVersionrequired; the change is additive onv1alpha1.ParseCriteriaServiceTypehandles unknown service types today).CLI surface
Two minimal CLI additions leverage the new field:
aicr recipe --stability stable— filter candidate overlays by stability during matching. Default behavior omitsexperimentalanddeprecatedunless explicitly requested.aicr recipe listoutput (once that subcommand lands; currently under consideration in the roadmap) surfaces the stability column alongside name and criteria.Validators (
aicr validate --phase ...) do not filter by stability — they run whatever recipe they are pointed at. The filter is a recipe-selection concern, not a validation one.Evidence / provenance (optional, non-blocking)
A follow-on consideration, not required in the initial change, is to let
stabilityNotecarry a structured pointer to validation evidence (a SHA, a PR URL, or a path to aVALIDATION-EVIDENCE.md-style matrix entry). That keeps the feature human-readable while leaving space for a tighter schema later.Example
Alternatives considered
preview-<name>.yamlorrecipes/overlays/preview/). Works, but relies entirely on reviewer vigilance and breaks down once recipe names become semantically meaningful on their own. It also makes graduation frompreviewtostablea filesystem rename rather than a metadata edit, which is noisier.metadata.annotationsblock onRecipeMetadatawould be flexible but under-specifies the common case (the stability signal) and pushes the contract into a string-typed map. A dedicated enum is clearer for this specific, well-understood concern; a future general-purpose annotations block could still coexist.v1alpha1. Keeps the schema flat but does not scale as the recipe library grows to include pre-GA SKU placeholders and community contributions with varying maturity.Scope of this issue
This issue scopes the schema change and the minimum-viable CLI consumption of it. Follow-on issues could cover:
aicr recipe list(dependent onlistsubcommand landing).stabilityNoteschema (pointer / URI / evidence manifest).Happy to open a PR against this issue once the proposed direction is confirmed, or to refine the proposal in the comments.
Context
Surfaced in the course of planning AICR integration into a downstream validation framework that explores contributing DGX BasePOD / SuperPOD recipes upstream across current and forward-looking SKU families (B200, B300, GB200, GB300, and later Vera Rubin).