Skip to content

Proposal: per-recipe stability signal (preview / experimental / stable / deprecated) #613

@resker

Description

@resker

Summary

Add a per-recipe stability / maturity signal so consumers can distinguish vetted recipes from preview or experimental ones. Today the only maturity signal is the shared apiVersion: aicr.nvidia.com/v1alpha1, which is schema-wide and uniform; there is no way for a consumer to tell a well-exercised h100-eks-ubuntu-inference-dynamo apart from a placeholder overlay staged ahead of a new SKU's GA or a recipe that is a work-in-progress.

Motivation

In a scenario where AICR is employed for platform-conformance validation (for K8s and above elements) of a regularly evolving full-stack reference architecture (applicable to both well-established platforms and those still nascent and in-development):

  1. Forward-looking SKU recipes. To add DGX platforms to the recipe matrix (B300, GB300, and later Vera Rubin class) we want to stage overlays alongside new accelerator enum values before those SKUs are broadly deployed or even finalized. A consumer running aicr recipe --accelerator vr200 --service dgx-superpod --intent training should get a clear signal that the returned recipe is a scaffold, not yet a battle-tested spec.
  2. Experimental recipe variants. Recipes exploring a new platform stack (for example, a variant of -inference- with an alternative gateway) benefit from being discoverable in the library without being quietly selected by criteria-only matching.
  3. Deprecation. When a recipe is superseded (e.g. rolled into a mixin, or replaced by a newer variant) I don't believe there's a way to mark it (in schema) as deprecated while still shipping it for a transition period.

More broadly: as the recipe library grows past the current ~30 overlays and begins to include contributions from multiple organizations and community members, a lifecycle indicator is useful for maintainers as well as consumers.

Proposal

Add two optional fields to RecipeMetadataSpec (pkg/recipe/metadata.go):

spec:
  stability: preview        # one of: stable | preview | experimental | deprecated
  stabilityNote: "Placeholder for VR200 SKU; enum added ahead of GA. No cluster validation yet."

Enum semantics

Value Meaning
stable Default when the field is absent. Recipe has been exercised on at least one representative cluster and is intended for general consumption.
preview Recipe is complete and internally consistent but has not yet accumulated real-world validation evidence; consumers should expect iteration.
experimental Recipe is intentionally exploratory (alternate stack choices, partial coverage, investigative). Not a candidate for promotion to stable on its current trajectory.
deprecated Recipe is being phased out. stabilityNote should point to the replacement.

Compatibility

  • Both fields are omitempty. Missing stability means stable — existing recipes and overlays continue to work unchanged.
  • No bump to apiVersion required; the change is additive on v1alpha1.
  • Validation: unknown values fail parsing with a clear error (parallel to how ParseCriteriaServiceType handles unknown service types today).

CLI surface

Two minimal CLI additions leverage the new field:

  • aicr recipe --stability stable — filter candidate overlays by stability during matching. Default behavior omits experimental and deprecated unless explicitly requested.
  • aicr recipe list output (once that subcommand lands; currently under consideration in the roadmap) surfaces the stability column alongside name and criteria.

Validators (aicr validate --phase ...) do not filter by stability — they run whatever recipe they are pointed at. The filter is a recipe-selection concern, not a validation one.

Evidence / provenance (optional, non-blocking)

A follow-on consideration, not required in the initial change, is to let stabilityNote carry a structured pointer to validation evidence (a SHA, a PR URL, or a path to a VALIDATION-EVIDENCE.md-style matrix entry). That keeps the feature human-readable while leaving space for a tighter schema later.

Example

kind: RecipeMetadata
apiVersion: aicr.nvidia.com/v1alpha1
metadata:
  name: vr200-dgx-superpod-ubuntu-training-runai
spec:
  base: base
  stability: preview
  stabilityNote: >-
    Scaffold for VR200-class DGX SuperPOD deployments. Accelerator enum
    and topology assertions staged ahead of SKU GA; not yet exercised on
    hardware. Graduates to stable after the first cluster run captures
    evidence in the downstream validation matrix.
  criteria:
    service: dgx-superpod
    accelerator: vr200
    intent: training
    os: ubuntu
    platform: runai

Alternatives considered

  • Naming convention only (preview-<name>.yaml or recipes/overlays/preview/). Works, but relies entirely on reviewer vigilance and breaks down once recipe names become semantically meaningful on their own. It also makes graduation from preview to stable a filesystem rename rather than a metadata edit, which is noisier.
  • Kubernetes-style annotations. A generic metadata.annotations block on RecipeMetadata would be flexible but under-specifies the common case (the stability signal) and pushes the contract into a string-typed map. A dedicated enum is clearer for this specific, well-understood concern; a future general-purpose annotations block could still coexist.
  • Do nothing and rely on v1alpha1. Keeps the schema flat but does not scale as the recipe library grows to include pre-GA SKU placeholders and community contributions with varying maturity.

Scope of this issue

This issue scopes the schema change and the minimum-viable CLI consumption of it. Follow-on issues could cover:

  • aicr recipe list (dependent on list subcommand landing).
  • Stability-aware test matrix (which stability levels are exercised in CI).
  • Structured stabilityNote schema (pointer / URI / evidence manifest).

Happy to open a PR against this issue once the proposed direction is confirmed, or to refine the proposal in the comments.

Context

Surfaced in the course of planning AICR integration into a downstream validation framework that explores contributing DGX BasePOD / SuperPOD recipes upstream across current and forward-looking SKU families (B200, B300, GB200, GB300, and later Vera Rubin).

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions