Refine Nemotron Customizer airgap image flow

- Rename airgap artifacts to use launcher and execution image terminology - Update runner stages, manifests, README, and config keys to match the new naming - Keep execution image generation scoped to selected Nemotron Customizer steps - Preserve external handling for models, datasets, checkpoints, and customer storage paths - Refresh SFT Megatron Bridge airgap overlay configs - Update tests for launcher/execution image behavior and staged runner flow Signed-off-by: Rakesh Paul <rapaul@nvidia.com>
NVIDIA-NeMo · rapaul-nv · May 8, 2026 · May 8, 2026 · May 8, 2026 · May 11, 2026
commit f4b8f50910c616d8fe48842ef2cf2ea3fc4bed1b
diff --git a/...emotron-customizer/airgap/Dockerfile.task → ...on-customizer/airgap/Dockerfile.execution b/...emotron-customizer/airgap/Dockerfile.task → ...on-customizer/airgap/Dockerfile.execution
@@ -1,11 +1,11 @@
-# Derivative task image for Nemotron Customizer airgap.
+# Derivative execution image for Nemotron Customizer airgap.
 # Built from the real training/runtime image and only adds small missing
 # wrapper packages.
 
 ARG BASE_IMAGE
 FROM ${BASE_IMAGE}
 
-ARG TASK_REQUIREMENTS
+ARG EXECUTION_REQUIREMENTS
 ARG REPO_OVERLAYS
 ARG REPO_OVERLAYS_DIR
 ARG PYTHON_BIN=python
@@ -16,16 +16,16 @@ ENV TRANSFORMERS_OFFLINE=1
 ENV HF_DATASETS_OFFLINE=1
 ENV WANDB_MODE=offline
 
-COPY ${TASK_REQUIREMENTS} /opt/nemotron-airgap/task-requirements.txt
+COPY ${EXECUTION_REQUIREMENTS} /opt/nemotron-airgap/execution-requirements.txt
 COPY ${REPO_OVERLAYS} /opt/nemotron-airgap/repo-overlays.json
 COPY ${REPO_OVERLAYS_DIR}/ /opt/nemotron-airgap/repo-overlays/
 
 # Build-time installs keep --no-cache-dir so derivative image layers stay small.
-RUN if [ -s /opt/nemotron-airgap/task-requirements.txt ]; then \
+RUN if [ -s /opt/nemotron-airgap/execution-requirements.txt ]; then \
       if [ "${PIP_NO_DEPS}" = "true" ]; then \
-        ${PYTHON_BIN} -m pip install --no-cache-dir --no-deps -r /opt/nemotron-airgap/task-requirements.txt; \
+        ${PYTHON_BIN} -m pip install --no-cache-dir --no-deps -r /opt/nemotron-airgap/execution-requirements.txt; \
       else \
-        ${PYTHON_BIN} -m pip install --no-cache-dir -r /opt/nemotron-airgap/task-requirements.txt; \
+        ${PYTHON_BIN} -m pip install --no-cache-dir -r /opt/nemotron-airgap/execution-requirements.txt; \
       fi; \
     fi && \
     ${PYTHON_BIN} - <<'PY'

diff --git a/...mizer/airgap/Dockerfile.task.dockerignore → .../airgap/Dockerfile.execution.dockerignore b/...mizer/airgap/Dockerfile.task.dockerignore → .../airgap/Dockerfile.execution.dockerignore
@@ -4,8 +4,8 @@
 !deploy/nemotron-customizer
 !deploy/nemotron-customizer/airgap
 !deploy/nemotron-customizer/airgap/out
-!deploy/nemotron-customizer/airgap/out/task-context
-!deploy/nemotron-customizer/airgap/out/task-context/**
+!deploy/nemotron-customizer/airgap/out/execution-context
+!deploy/nemotron-customizer/airgap/out/execution-context/**
 !deploy/nemotron-customizer/airgap/out/repo-overlays
 !deploy/nemotron-customizer/airgap/out/repo-overlays/**
 

diff --git a/...on-customizer/airgap/Dockerfile.submitter → ...ron-customizer/airgap/Dockerfile.launcher b/...on-customizer/airgap/Dockerfile.submitter → ...ron-customizer/airgap/Dockerfile.launcher
@@ -1,4 +1,4 @@
-# Submitter image for Nemotron Customizer airgap.
+# Launcher image for Nemotron Customizer airgap.
 # It contains the repo and a uv-synced environment. It does not run training.
 
 ARG BASE_IMAGE=python:3.12-slim

diff --git a/.../airgap/Dockerfile.submitter.dockerignore → ...r/airgap/Dockerfile.launcher.dockerignore b/.../airgap/Dockerfile.submitter.dockerignore → ...r/airgap/Dockerfile.launcher.dockerignore
diff --git a/deploy/nemotron-customizer/airgap/README.md b/deploy/nemotron-customizer/airgap/README.md
@@ -5,22 +5,22 @@ This folder is scoped only to Nemotron Customizer steps under
 
 The flow is intentionally small:
 
-1. Build one **submitter image** with this repo and `uv.lock`.
-2. Build one or more **task images** by grouping selected workflow stages by base image.
+1. Build one **launcher image** with this repo and `uv.lock`.
+2. Build one or more **execution images** by grouping selected workflow stages by base image.
 3. Save those images as tarballs for the airgapped side.
 4. Keep models, datasets, checkpoints, and customer files on persistent storage.
 
 Edit `airgap.yaml` first:
 
 - `workflow.stages`: the Nemotron Customizer steps the customer wants to run
 - `dependencies`: central step dependency map, for example SFT training needs SFT packing
-- `step_images`: which task image each step should use
-- `task_images`: the base image, output tag, and known/import-probed Python requirements
+- `step_execution_images`: which execution image each step should use
+- `execution_images`: the base image, output tag, and known/import-probed Python requirements
 
 Only steps reached from `workflow.stages` are built. Steps are grouped by
 `base_image + repo_overlays`; each group gets one derivative image with the
 union of its small missing packages. If two selected step families share the
-same base image and repo overlays, the runner emits one combined task image for
+same base image and repo overlays, the runner emits one combined execution image for
 both.
 
 Run from the repo root:
@@ -45,7 +45,7 @@ To run only a few stages:
 uv run python deploy/nemotron-customizer/airgap/runner.py \
   --config deploy/nemotron-customizer/airgap/airgap.yaml \
   --stage validate \
-  --stage discover-task-deps
+  --stage discover-execution-deps
 ```
 
 To override the workflow without editing YAML, pass one or more selected
@@ -63,18 +63,18 @@ uv run python deploy/nemotron-customizer/airgap/runner.py \
 Outputs are written under `deploy/nemotron-customizer/airgap/out/` by default:
 
 - `airgap-manifest.yaml`: what was validated and built
-- `airgap-progress.yaml`: incomplete execute run state used for resume
-- `airgap-complete.yaml`: final execute run state after success
-- `requirements-<task-group>.txt`: small missing packages per task image
-- `repo-overlays-<task-group>.json`: git auto-mounts discovered from selected step configs
-- `submitter-image.tar`
-- `task-*.tar`
+- `airgap-build-state.yaml`: incomplete execute run state used for resume
+- `airgap-build-complete.yaml`: final execute run state after success
+- `requirements-<execution-group>.txt`: small missing packages per execution image
+- `repo-overlays-<execution-group>.json`: git auto-mounts discovered from selected step configs
+- `launcher-image.tar`
+- `execution-*.tar`
 - SHA256 checksums for saved image tarballs in `airgap-manifest.yaml`
 
-If an execute run fails midway, leave `airgap-progress.yaml` in place and rerun
+If an execute run fails midway, leave `airgap-build-state.yaml` in place and rerun
 the same command. Completed expensive actions are reused when their artifacts
 still exist. If you intentionally change the workflow or image plan before
-finishing, move or remove `airgap-progress.yaml` first; the runner will not
+finishing, move or remove `airgap-build-state.yaml` first; the runner will not
 silently overwrite incomplete state from a different plan.
 
 Runtime dependency probes use Docker volumes named
@@ -88,19 +88,19 @@ executor-visible persistent storage and reference them through config overrides
 and `run.env.mounts`.
 
 During dependency discovery, the runner mounts the connected-machine checkout
-into each task image only to probe imports. The final task image deliberately
-does not bake this repo; the submitter image and the normal nemo-run/nemo-runspec
+into each execution image only to probe imports. The final execution image deliberately
+does not bake this repo; the launcher image and the normal nemo-run/nemo-runspec
 code transport provide the repo to the remote job at submission time.
 
 Repo logistics stay outside `airgap.yaml`. If a selected step config contains
 `${auto_mount:git+...}`, the runner treats it as a connected-machine build input:
-it fetches that pinned repo and bakes it into the derivative task image at the
+it fetches that pinned repo and bakes it into the derivative execution image at the
 requested target path. Runtime jobs then use the baked image and do not clone
 from GitHub. Site-specific data/model mounts remain in env profiles or step
 overrides.
 
 If the connected machine is not the same architecture as the target cluster,
-set `platform: linux/amd64` on the submitter or task image entry in
+set `platform: linux/amd64` on the `launcher_image` or execution image entry in
 `airgap.yaml`. If you need to minimize transfer size for several images that
 share layers, `docker save -o all-images.tar tag1 tag2 ...` can be used after
 the runner builds the images; a single tar deduplicates shared layers better
@@ -124,8 +124,8 @@ workflow:
 When submitting inside the airgap, use the deploy overlay config so those git
 auto-mounts are cleared at runtime while persistent storage mounts from the env
 profile still apply. Use the image printed by the runner under
-`selected step images`, or read it from `out/airgap-manifest.yaml` under
-`step_images`.
+`selected execution images`, or read it from `out/airgap-manifest.yaml` under
+`step_execution_images`.
 
 ```bash
 uv run nemotron step run sft/megatron_bridge \

diff --git a/deploy/nemotron-customizer/airgap/airgap.yaml b/deploy/nemotron-customizer/airgap/airgap.yaml
@@ -2,7 +2,7 @@
 #
 # Change workflow.stages to the steps the customer wants. The runner expands
 # dependencies, validates those step files/configs, groups selected steps by
-# task image, then builds only the images needed for that selection.
+# execution image, then builds only the images needed for that selection.
 
 workflow:
   name: sft-megatron-bridge
@@ -18,18 +18,18 @@ workflow:
 
 build_stages:
   - validate
-  - discover-task-deps
-  - build-submitter
-  - build-task-images
+  - discover-execution-deps
+  - build-launcher-image
+  - build-execution-images
   - save-images
 
 paths:
   output_dir: deploy/nemotron-customizer/airgap/out
 
-submitter:
+launcher_image:
   base_image: python:3.12-slim
-  tag: nemotron-customizer-submit-airgap:latest
-  tar: submitter-image.tar
+  tag: nemotron-customizer-launcher-airgap:latest
+  tar: launcher-image.tar
 
 # Central dependency map. Keep this small and explicit: it is only for steps
 # that naturally require a previous Nemotron Customizer step output.
@@ -51,15 +51,15 @@ dependencies:
   # SDG can feed SFT or RL prep, but it is not forced as a dependency because
   # many customers bring their own JSONL on persistent storage.
 
-# Step -> task-image mapping. The runner only uses entries reached from
+# Step -> execution-image mapping. The runner only uses entries reached from
 # workflow.stages after dependency expansion.
-step_images:
+step_execution_images:
   byob: nemo-data-designer
   convert/hf_to_megatron: nemo-megatron
   convert/megatron_to_hf: nemo-megatron
   convert/merge_lora: nemo-megatron
   curate/nemo_curator: nemo-curator
-  env/env_toml: submitter-python
+  env/env_toml: launcher-python
   eval/model_eval: nemo-eval
   optimize/modelopt/distill: nemo-modelopt
   optimize/modelopt/prune: nemo-modelopt
@@ -80,51 +80,51 @@ step_images:
   translate/nemo_skills: nemo-curator
   translate/translation: nemo-curator
 
-task_images:
-  submitter-python:
+execution_images:
+  launcher-python:
     base_image: python:3.12-slim
-    tag: nemotron-customizer-python-task-airgap:latest
-    tar: task-python-image.tar
+    tag: nemotron-customizer-python-execution-airgap:latest
+    tar: execution-python-image.tar
 
   nemo-megatron:
     base_image: nvcr.io/nvidia/nemo:25.11.nemotron_3_nano
     tag: nemotron-customizer-nemo-megatron-airgap:latest
-    tar: task-nemo-megatron-image.tar
+    tar: execution-nemo-megatron-image.tar
     required_imports: []
 
   nemo-automodel:
     base_image: nvcr.io/nvidia/nemo-automodel:26.04
     tag: nemotron-customizer-nemo-automodel-airgap:latest
-    tar: task-nemo-automodel-image.tar
+    tar: execution-nemo-automodel-image.tar
     required_imports: []
 
   nemo-rl:
     base_image: nvcr.io/nvidia/nemo-rl:v0.6.0
     tag: nemotron-customizer-nemo-rl-airgap:latest
-    tar: task-nemo-rl-image.tar
+    tar: execution-nemo-rl-image.tar
     required_imports: []
 
   nemo-modelopt:
     base_image: nvcr.io/nvidia/nemo:26.02
     tag: nemotron-customizer-nemo-modelopt-airgap:latest
-    tar: task-nemo-modelopt-image.tar
+    tar: execution-nemo-modelopt-image.tar
     required_imports: []
 
   nemo-curator:
     base_image: nvcr.io/nvidia/nemo-curator:25.07
     tag: nemotron-customizer-nemo-curator-airgap:latest
-    tar: task-nemo-curator-image.tar
+    tar: execution-nemo-curator-image.tar
     required_imports: []
 
   nemo-data-designer:
     base_image: nvcr.io/nvidia/nemo:25.11.nemotron_3_nano
     tag: nemotron-customizer-nemo-data-designer-airgap:latest
-    tar: task-nemo-data-designer-image.tar
+    tar: execution-nemo-data-designer-image.tar
     required_imports:
       - data_designer
 
   nemo-eval:
     base_image: nvcr.io/nvidia/nemo:25.11.nemotron_3_nano
     tag: nemotron-customizer-nemo-eval-airgap:latest
-    tar: task-nemo-eval-image.tar
+    tar: execution-nemo-eval-image.tar
     required_imports: []
diff --git a/deploy/nemotron-customizer/airgap/configs/sft_megatron_bridge_default.yaml b/deploy/nemotron-customizer/airgap/configs/sft_megatron_bridge_default.yaml
@@ -1,7 +1,7 @@
 # Airgap runtime overlay for sft/megatron_bridge:default.
 #
 # The connected-machine airgap runner bakes the auto_mount repos from the base
-# config into the derivative task image. At runtime, clear those git auto-mounts
+# config into the derivative execution image. At runtime, clear those git auto-mounts
 # so the airgapped job does not clone from GitHub. Env-profile persistent
 # storage mounts still append normally.
 

diff --git a/deploy/nemotron-customizer/airgap/configs/sft_megatron_bridge_tiny.yaml b/deploy/nemotron-customizer/airgap/configs/sft_megatron_bridge_tiny.yaml
@@ -1,7 +1,7 @@
 # Airgap runtime overlay for sft/megatron_bridge:tiny.
 #
 # The connected-machine airgap runner bakes the auto_mount repos from the base
-# config into the derivative task image. At runtime, clear those git auto-mounts
+# config into the derivative execution image. At runtime, clear those git auto-mounts
 # so the airgapped job does not clone from GitHub. Env-profile persistent
 # storage mounts still append normally.