Skip to content

Add airgap packaging for Nemotron Customizer#194

Open
rapaul-nv wants to merge 5 commits into
NVIDIA-NeMo:romeyn/agenticfrom
rkalaniNV:rapaul/airgap-support
Open

Add airgap packaging for Nemotron Customizer#194
rapaul-nv wants to merge 5 commits into
NVIDIA-NeMo:romeyn/agenticfrom
rkalaniNV:rapaul/airgap-support

Conversation

@rapaul-nv
Copy link
Copy Markdown
Contributor

Summary

This PR adds a lightweight airgap packaging flow for Nemotron Customizer, scoped to src/nemotron/steps.

It introduces a deploy-side runner under deploy/nemotron-customizer/airgap/ that can package the selected workflow steps into:

  • a portable launcher image for running Nemotron CLI and submitting jobs from the airgapped environment
  • deduplicated execution images for the remote step containers used by Lepton, Slurm, or Run:ai

Models, datasets, checkpoints, and customer data are intentionally not baked into the images. They remain external assets expected to live in customer-managed persistent storage and are referenced through runtime configs.

Changes

  • Add airgap runner, config, README, Dockerfiles, and dockerignore files under deploy/nemotron-customizer/airgap
  • Add airgap.yaml to describe selectable workflow stages and execution image groups
  • Add dependency discovery for execution images by probing selected step modules
  • Bake required repo overlays from step configs into derived execution images
  • Add SFT Megatron Bridge airgap overlay configs that avoid runtime GitHub mounts
  • Generate image manifests with role, image tag, tar path, and SHA256 checksums
  • Add resumable build state files for interrupted airgap packaging runs
  • Rename terminology from submitter/task images to launcher/execution images
  • Add focused tests for runner behavior and image planning

Validation

  • uv run pytest tests/deploy/test_airgap_runner.py

Result:

  • 17 passed

Comment thread src/nemotron/steps/sdg/data_designer/step.py
rapaul-nv added 3 commits May 11, 2026 10:26
- Add deploy-scoped airgap tooling for Nemotron Customizer steps under
  src/nemotron/steps.
- Build a portable submitter image plus deduplicated task images for selected
  workflow targets.
- Expand step dependencies and map selected steps to task image families through
  a single airgap.yaml.
- Discover small task-image Python dependency gaps and bake pinned repo overlays
  required by step configs.
-  Models, datasets, checkpoints, and customer data to be kept in external persistent
  storage by user
- Add resumable build state, image manifests with checksums, Dockerfiles, SFT
  overlay configs, README guidance, and focused tests.

Signed-off-by: Rakesh Paul <rapaul@nvidia.com>
- Rename airgap artifacts to use launcher and execution image terminology
- Update runner stages, manifests, README, and config keys to match the new naming
- Keep execution image generation scoped to selected Nemotron Customizer steps
- Preserve external handling for models, datasets, checkpoints, and customer storage paths
- Refresh SFT Megatron Bridge airgap overlay configs
- Update tests for launcher/execution image behavior and staged runner flow

Signed-off-by: Rakesh Paul <rapaul@nvidia.com>
- Install git and CA certificates in the launcher image before uv sync
- Capture only docker inspect stdout while suppressing stderr during platform checks
- Keep the airgap runner platform probe compatible with subprocess stderr handling

Signed-off-by: Rakesh Paul <rapaul@nvidia.com>
@rapaul-nv rapaul-nv force-pushed the rapaul/airgap-support branch from c7dedb0 to 4c02460 Compare May 11, 2026 04:57
rapaul-nv added 2 commits May 11, 2026 15:06
- Move generic step commands and backends from commands/step to commands/steps
- Register only `nemotron steps`; remove the singular `nemotron step` alias
- Expose `steps list`, `steps show`, `steps run`, and `steps translation`
- Update imports, tests, docs, skills, and config examples to the plural CLI
- Add coverage for plural command registration and singular alias rejection

Signed-off-by: Rakesh Paul <rapaul@nvidia.com>
Signed-off-by: Rakesh Paul <rapaul@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants