Name	Name	Last commit message	Last commit date
parent directory ..
common	common
docs	docs
examples/Qwen/Qwen3-8B	examples/Qwen/Qwen3-8B
modules	modules
tests	tests
.gitignore	.gitignore
CLAUDE.md	CLAUDE.md
README.md	README.md
__init__.py	__init__.py
core.py	core.py
launch.py	launch.py
pyproject.toml	pyproject.toml
slurm_config.py	slurm_config.py

Name

Last commit message

Last commit date

common

docs

examples/Qwen/Qwen3-8B

ModelOpt Launcher

Submit ModelOpt quantization, training, and evaluation jobs to Slurm clusters or run them locally with Docker.

Quick Start

# Install
curl -LsSf https://astral.sh/uv/install.sh | sh
git submodule update --init --recursive

# Run locally with 1 GPU
cd Model-Optimizer/tools/launcher
uv run launch.py --yaml examples/Qwen/Qwen3-8B/megatron_lm_ptq_local.yaml hf_local=/mnt/hf-local --yes

# Run on a Slurm cluster (4 GPUs)
export SLURM_HOST=login-node.example.com
export SLURM_ACCOUNT=my_account
export SLURM_HF_LOCAL=/mnt/hf-local
export SLURM_JOB_DIR=/shared/experiments
uv run launch.py --yaml examples/Qwen/Qwen3-8B/megatron_lm_ptq.yaml --yes

Local vs cluster: megatron_lm_ptq.yaml defaults to TP=4 on 4 GPUs. Use megatron_lm_ptq_local.yaml for single-GPU local Docker runs.

Directory Structure

tools/launcher/
├── launch.py                       # Main entrypoint
├── core.py                         # Core logic (dataclasses, executors, run loop)
├── slurm_config.py                 # SlurmConfig dataclass and factory
├── common/                         # Scripts and typed tasks
│   ├── megatron_lm/quantize/
│   │   ├── quantize.sh             # PTQ quantization + MMLU evaluation
│   │   └── task.py                 # MegatronLMQuantizeTask (typed config)
│   ├── tensorrt_llm/query.sh       # TRT-LLM server + query
│   ├── vllm/query.sh               # vLLM server + query
│   ├── eagle3/                     # EAGLE3 speculative decoding scripts
│   └── specdec_bench/              # Speculative decoding benchmark
├── examples/                        # Example configs
│   └── Qwen/Qwen3-8B/
│   ├── megatron_lm_ptq.yaml        # PTQ (4 GPUs, Slurm)
│   ├── megatron_lm_ptq_local.yaml  # PTQ (1 GPU, local Docker)
│   └── hf_offline_eagle3.yaml      # EAGLE3 offline pipeline
├── tests/                          # 64 unit tests
├── modules/                        # Dependencies
│   ├── Megatron-LM/                # Git submodule
│   └── Model-Optimizer -> ../..    # Symlink (auto-created)
└── docs/                           # Documentation
    ├── configuration.md            # YAML formats, overrides, hf_local
    ├── architecture.md             # Design, factory system, typed tasks
    ├── testing.md                  # Running tests, CI
    ├── claude_code.md              # Claude Code workflows
    └── contributing.md             # Adding models, bug reporting

Documentation

Guide	Description
Configuration	YAML formats, CLI overrides, flags, `hf_local`
Architecture	Shared core, factory system, typed tasks, mount mechanism
Testing	Running tests locally and in CI
Claude Code	Submit, monitor, diagnose workflows
Contributing	Adding models, typed tasks, bug reporting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

ModelOpt Launcher

Quick Start

Directory Structure

Documentation

FilesExpand file tree

launcher

Directory actions

More options

Directory actions

More options

Latest commit

History

launcher

Folders and files

parent directory

README.md

ModelOpt Launcher

Quick Start

Directory Structure

Documentation