NVIDIA-NeMo · rkalaniNV · Apr 8, 2026 · Apr 8, 2026 · Apr 8, 2026 · Apr 8, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,235 @@
+# AGENTS.md -- Nemotron Repository Agent Context
+
+## What This Repo Does
+
+Nemotron is NVIDIA's open-source repository for reproducible LLM training pipelines. It provides:
+
+1. **Training recipes** for NVIDIA model families (Nano3, Super3, Embed) -- full pretrain/SFT/RL pipelines
+2. **Customization recipes** for adapting models to new languages, domains, and use cases (Sovereign AI Playbook)
+3. **Data preparation** infrastructure for tokenization, packing, and format conversion
+4. **Evaluation** via NeMo Evaluator with benchmark suites
+
+## Repository Layout
+
+```
+Nemotron/
+  AGENTS.md                          <-- You are here
+  pyproject.toml                     <-- Package config; entry point: nemotron CLI
+  src/
+    nemo_runspec/                    <-- Config loading, execution, PEP 723 metadata parsing
+    nemotron/
+      cli/
+        bin/nemotron.py              <-- CLI root (Typer app)
+        commands/
+          nano3/                     <-- Nano3 commands: pretrain, sft, rl, eval, pipe
+          super3/                    <-- Super3 commands: pretrain, sft, rl (rlhf/rlvr/swe)
+          embed/                     <-- Embedding model commands: sdg, prep, finetune, eval, export, deploy
+          customize/                 <-- Customization CLI: translate, data-prep, cpt, sft, sdg, rl, byob, eval, quantize
+        kit/                         <-- CLI utilities (app, squash)
+      kit/                           <-- Domain toolkit: Artifact types, lineage tracking, W&B, recipe loading
+      data_prep/                     <-- Distributed data prep library (bin/idx, packed parquet, JSONL)
+      recipes/
+        nano3/                       <-- Nano3 recipe scripts + configs
+          stage0_pretrain/           <-- train.py, data_prep.py, config/
+          stage1_sft/
+          stage2_rl/
+          stage3_eval/
+        super3/                      <-- Super3 recipe scripts + configs
+          stage0_pretrain/
+          stage1_sft/
+          stage2_rl/                 <-- Sub-stages: rlvr, swe1, swe2, rlhf
+          stage3_eval/
+        embed/                       <-- Embedding model recipes
+          stage0_sdg/ .. stage5_deploy/
+        data_curation/               <-- NeMo Curator recipes (nemotron-cc)
+      customization_recipes/         <-- Sovereign AI customization pipelines
+        nemotron/                    <-- Nemotron model customization (7 stages: 0-6)
+          SKILL.md                   <-- E2E customization pipeline skill definition
+          stage0_data_prep/          <-- Data Preparation & Translation
+          stage1_cpt/                <-- Continued Pretraining
+          stage2_sft/                <-- Supervised Fine-Tuning + SDG
+          stage3_rl/                 <-- Reinforcement Learning (DPO/GRPO)
+          stage4_byob/               <-- Build Your Own Benchmark
+          stage5_eval/               <-- Evaluation
+          stage6_quantization/       <-- Quantization for deployment
+        llama/                       <-- Llama model customization (same stage structure)
+        qwen/                        <-- Qwen model customization (same stage structure)
+        data_prep/                   <-- Shared data prep utilities for customization
+  tests/
+  docs/
+  deploy/                            <-- Deployment configs (Docker, Helm)
+  tools/
+  usage-cookbook/
+  use-case-examples/
+```
+
+## Key Infrastructure
+
+### nemotron CLI
+
+Entry point: `nemotron` (defined in `pyproject.toml` as `nemotron.__main__:main`).
+
+```bash
+# Pattern: nemotron <model> <stage> [options] [overrides]
+nemotron nano3 pretrain -c default                     # Local execution
+nemotron nano3 pretrain -c default --run MY-CLUSTER    # Remote via nemo-run (attached)
+nemotron nano3 pretrain -c default --batch MY-CLUSTER  # Remote via nemo-run (detached)
+nemotron nano3 pretrain -c default --dry-run            # Preview compiled config
+nemotron nano3 sft -c default --run MY-CLUSTER train.train_iters=5000  # Override params
+nemotron nano3 pipe --run MY-CLUSTER                    # Compose pretrain + sft
+nemotron nano3 eval --run MY-CLUSTER                    # Run evaluation suite
+
+# Data prep (run directly, not via CLI)
+python src/nemotron/recipes/nano3/stage0_pretrain/data_prep.py --config <yaml>
+```
+
+Global options: `-c/--config`, `-r/--run`, `-b/--batch`, `-d/--dry-run`, `--stage`, `--force-squash`.
+
+### nemo_runspec
+
+Module: `src/nemo_runspec/`
+
+Parses PEP 723 `[tool.runspec]` metadata from recipe scripts. Provides:
+- `nemo_runspec.parse(script_path)` -- returns `Runspec` with name, image, config_dir, resources
+- `nemo_runspec.config` -- OmegaConf YAML loading, job config building, artifact URI resolution
+- `nemo_runspec.execution` -- local (torchrun) and remote (Slurm/Lepton/Run:AI/Ray via nemo-run) execution
+- `nemo_runspec.packaging` -- SelfContainedPackager for remote code shipping
+
+Config resolution chain: script `[tool.runspec]` -> `config/<name>.yaml` -> `env.toml` profile -> CLI overrides.
+
+### nemotron.kit
+
+Module: `src/nemotron/kit/`
+
+Domain-specific toolkit:
+- `nemotron.kit.Artifact` -- base class for typed artifacts (pydantic)
+- `nemotron.kit.ModelArtifact`, `PretrainDataArtifact`, `SFTDataArtifact` -- typed artifact classes
+- `nemotron.kit.init(backend="fsspec"|"wandb", root=...)` -- initialize artifact registry
+- `nemotron.kit.recipe_loader` -- `import_recipe_function(target)`, `extract_recipe_config(config)`
+- `nemotron.kit.train_script` -- `parse_config_and_overrides()`, `load_omegaconf_yaml()`, `apply_hydra_overrides()`
+- `nemotron.kit.wandb_kit` -- W&B initialization, monkey patches, lineage tracking
+
+### nemotron.data_prep
+
+Module: `src/nemotron/data_prep/`
+
+Distributed data prep built on cosmos-xenna pipelines:
+- `nemotron.data_prep.api` -- `run_pretrain_pipeline()`, `run_sft_pipeline()`
+- Three-phase pattern: `setup_*_run()` -> xenna pipeline stages -> `finalize_*_run()`
+- Output formats: bin/idx (pretrain), packed Parquet (SFT), JSONL (RL)
+- Stages: PlanStage -> DownloadStage -> terminal stage (BinIdxTokenization / PackedSftParquet / JsonlShard)
+
+## Task Routing
+
+| Task | Go to |
+|------|-------|
+| Train Nano3 from scratch | `src/nemotron/recipes/nano3/` |
+| Train Super3 from scratch | `src/nemotron/recipes/super3/` |
+| Train embedding model | `src/nemotron/recipes/embed/` |
+| Curate web data (CommonCrawl) | `src/nemotron/recipes/data_curation/nemotron-cc/` |
+| Translate data for customization | `src/nemotron/customization_recipes/nemotron/stage0_data_prep/SKILL.md` |
+| Customize Nemotron for a language/domain | `src/nemotron/customization_recipes/nemotron/SKILL.md` |
+| Customize Llama for a language/domain | `src/nemotron/customization_recipes/llama/SKILL.md` |
+| Customize Qwen for a language/domain | `src/nemotron/customization_recipes/qwen/SKILL.md` |
+| Prepare training data (tokenize, pack) | `src/nemotron/data_prep/` |
+| Add a new CLI command | `src/nemotron/cli/commands/` + register in `cli/bin/nemotron.py` |
+| Add a new recipe | Create `<stage>/train.py` with `[tool.runspec]` + `<stage>/config/default.yaml` |
+| Modify execution backend | Edit `_execute_*()` in the relevant CLI command module |
+| Evaluate a model | `src/nemotron/recipes/<model>/stage*_eval/` |
+| Build custom benchmarks (MCQ) | `src/nemotron/customization_recipes/nemotron/stage4_byob/SKILL.md` |
+| Quantize a model | `src/nemotron/customization_recipes/nemotron/stage6_quantization/SKILL.md` |
+
+## SKILL.md References
+
+| Skill | Path |
+|-------|------|
+| E2E Nemotron Customization | `src/nemotron/customization_recipes/nemotron/SKILL.md` |
+| Stage 0: Data Preparation & Translation | `src/nemotron/customization_recipes/nemotron/stage0_data_prep/SKILL.md` |
+| Stage 1: Continued Pretraining | `src/nemotron/customization_recipes/nemotron/stage1_cpt/SKILL.md` |
+| Stage 2: SFT + SDG | `src/nemotron/customization_recipes/nemotron/stage2_sft/SKILL.md` |
+| Stage 3: RL (DPO/GRPO) | `src/nemotron/customization_recipes/nemotron/stage3_rl/SKILL.md` |
+| Stage 4: BYOB Benchmarks | `src/nemotron/customization_recipes/nemotron/stage4_byob/SKILL.md` |
+| Stage 5: Evaluation | `src/nemotron/customization_recipes/nemotron/stage5_eval/SKILL.md` |
+| Stage 6: Quantization | `src/nemotron/customization_recipes/nemotron/stage6_quantization/SKILL.md` |
+| Shared Data Prep | `src/nemotron/customization_recipes/data_prep/SKILL.md` |
+| Llama Customization | `src/nemotron/customization_recipes/llama/SKILL.md` |
+| Qwen Customization | `src/nemotron/customization_recipes/qwen/SKILL.md` |
+
+## Execution Backends
+
+| Backend | Flag | Infrastructure | Notes |
+|---------|------|---------------|-------|
+| Local | (default) | torchrun on local GPUs | For dev/debug; single-node |
+| Docker | `--run <profile>` | nemo-run + DockerExecutor | Local GPU container execution |
+| Slurm (attached) | `--run <profile>` | nemo-run + SlurmExecutor | Logs streamed to terminal |
+| Slurm (detached) | `--batch <profile>` | nemo-run + SlurmExecutor | Submit and exit |
+| Lepton (DGX Cloud) | `--run <profile>` | nemo-run + LeptonExecutor | DGX Cloud via Lepton API; requires `node_group` |
+| Run:AI | `--run <profile>` | nemo-run + KubeflowExecutor | Kubernetes GPU orchestration via Run:AI; requires `cluster` + `project` |
+| Ray | (auto for RL) | nemo-run + RayJob | Used by GRPO/RL stages |
+
+Env profiles are stored in `env.toml` at repo root (not checked in). Examples:
+
+```toml
+# --- Slurm cluster ---
+[MY-CLUSTER]
+executor = "slurm"
+host = "login.cluster.example.com"
+user = "myuser"
+account = "myaccount"
+partition = "batch"
+remote_job_dir = "/lustre/myuser/jobs"
+container = "nvcr.io/nvidia/nemo:26.02.super.rc1"
+gpus_per_node = 8
+nodes = 2
+
+[MY-CLUSTER.wandb]
+entity = "my-team"
+project = "my-project"
+
+# --- Lepton (DGX Cloud) ---
+[lepton-dgx]
+executor = "lepton"
+container_image = "nvcr.io/nvidia/nemo:25.11.nemotron_3_nano"
+node_group = "my-dgx-group"
+resource_shape = "gpu.8xh100-80gb"
+nodes = 2
+gpus_per_node = 8
+
+[[lepton-dgx.mounts]]
+path = "/shared-storage/data"
+mount_path = "/data"
+
+# --- Run:AI (Kubernetes) ---
+[runai-cluster]
+executor = "runai"
+container_image = "nvcr.io/nvidia/nemo:25.11.nemotron_3_nano"
+cluster = "my-runai-cluster"
+project = "my-team"
+nodes = 2
+gpus_per_node = 8
+node_pool = "h100-pool"
+
+[[runai-cluster.pvc_mounts]]
+name = "training-data-pvc"
+mount_path = "/data"
+```
+
+## Config Resolution Order
+
+1. Recipe script `[tool.runspec]` PEP 723 metadata (name, image, config_dir, default config)
+2. YAML config file from `config/` directory (selected via `-c` flag)
+3. `env.toml` profile (selected via `--run`/`--batch` flag) -- merged into `run.env`
+4. CLI key=value overrides (Hydra-style, e.g., `train.train_iters=5000`)
+
+Artifact URIs (`${art:data,path}`, `${art:model,path}`) are resolved at config load time via `nemo_runspec.config.resolvers`.
+
+## Container Images
+
+| Model | Stage | Image |
+|-------|-------|-------|
+| Nano3 | Pretrain/SFT | `nvcr.io/nvidia/nemo:25.11.nemotron_3_nano` |
+| Nano3 | RL | `nvcr.io/nvidia/nemo-rl:v0.4.0.nemotron_3_nano` |
+| Super3 | Pretrain/SFT | `nvcr.io/nvidian/nemo:26.02.super.rc1` |
+| Customization | CPT/SFT | `nvcr.io/nvidia/nemo:25.11.nemotron_3_nano` (or model-specific) |
+| Customization | SDG | Requires NeMo DataDesigner |
+| Customization | Eval | NeMo Evaluator launcher pulls its own containers |
diff --git a/deploy/nemotron/customization_recipes/Dockerfile b/deploy/nemotron/customization_recipes/Dockerfile
@@ -0,0 +1,105 @@
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# =============================================================================
+# Nemotron Orchestrator Container (nemotron-orchestrator)
+#
+# Lightweight CLI + orchestration container. Routes work to the curator,
+# trainer, evaluator, and NIM service containers. Does NOT include heavy
+# ML frameworks (NeMo, Megatron, PyTorch) -- those live in dedicated
+# service containers.
+#
+# This is part of the multi-container customization deployment:
+#   - nemotron-orchestrator  (this image)  — CLI, orchestration, Docker client
+#   - nemotron-curator       — NeMo Curator, data prep, SDG, BYOB
+#   - nemotron-trainer       — NeMo + Megatron, CPT/SFT/RL training
+#   - nemotron-evaluator     — Model evaluation, benchmarks
+#   - nemotron-nim           — NIM for local LLM inference
+#
+# Build:
+#   docker compose build nemotron-orchestrator
+#
+# Run:
+#   docker compose run --rm nemotron-orchestrator nemotron customize --help
+# =============================================================================
+
+FROM nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04
+
+ARG REMOTE_USER=nemotron
+ARG REMOTE_UID=1000
+ARG REMOTE_GID=1000
+
+# Create the user/group (ignore if they already exist)
+RUN groupadd --gid $REMOTE_GID $REMOTE_USER -f && \
+    if [ -z "$(id -u $REMOTE_UID 2>/dev/null)" ]; then \
+        useradd --uid $REMOTE_UID --gid $REMOTE_GID -m $REMOTE_USER; \
+    fi
+
+# System dependencies
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends \
+        sudo \
+        ca-certificates \
+        curl \
+        git \
+        git-lfs \
+        wget \
+        unzip \
+        python3 \
+        python3-pip \
+        python3-dev \
+    && update-ca-certificates \
+    && ln -sf /usr/bin/python3 /usr/bin/python \
+    && rm -rf /var/lib/apt/lists/*
+
+# Add user to sudoers
+RUN REAL_USER=$(id -u -n ${REMOTE_UID} 2>/dev/null || echo $REMOTE_USER) && \
+    echo "$REAL_USER ALL=(root) NOPASSWD:ALL" > /etc/sudoers.d/$REAL_USER && \
+    chmod 0440 /etc/sudoers.d/$REAL_USER
+
+# Install Docker CLI (for orchestrating other containers)
+RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg && \
+    echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu jammy stable" \
+        > /etc/apt/sources.list.d/docker.list && \
+    apt-get update && \
+    apt-get install -y --no-install-recommends docker-ce-cli docker-compose-plugin && \
+    rm -rf /var/lib/apt/lists/*
+
+# Install NGC CLI (required for data-designer persona downloads and model access)
+RUN cd /tmp && \
+    wget -q -O ngccli_linux.zip https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.41.4/files/ngccli_linux.zip && \
+    unzip -q ngccli_linux.zip && \
+    mv ngc-cli /opt/ngc-cli && \
+    rm ngccli_linux.zip
+ENV PATH="/opt/ngc-cli:${PATH}"
+
+# Copy the Nemotron repo into the container
+COPY --chown=$REMOTE_UID:$REMOTE_GID . /workspace/nemotron
+
+WORKDIR /workspace/nemotron
+
+# Install Nemotron CLI (lightweight — no heavy ML deps)
+# The [customize] extras pull in orchestration + config deps only;
+# heavy training/inference deps are in the trainer/curator containers.
+RUN pip install --no-cache-dir -e ".[customize]"
+
+# Mark this container as the orchestrator so the dispatcher knows to route
+# commands to sibling containers via docker exec instead of running locally.
+ENV NEMOTRON_ORCHESTRATOR=1
+ENV NEMOTRON_CONTAINER=orchestrator
+
+# Switch to the user
+USER $REMOTE_UID
+
+CMD ["tail", "-f", "/dev/null"]