Skip to content

Support for modelopt with MoE QAT#3866

Merged
yaoyu-33 merged 2 commits into
NVIDIA-NeMo:mainfrom
HollowMan6:qat_patch
May 20, 2026
Merged

Support for modelopt with MoE QAT#3866
yaoyu-33 merged 2 commits into
NVIDIA-NeMo:mainfrom
HollowMan6:qat_patch

Conversation

@HollowMan6
Copy link
Copy Markdown
Member

@HollowMan6 HollowMan6 commented May 17, 2026

What does this PR do ?

Move all the patches for supporting modelopt based QAT on verl side to Megatron Bridge.

Changelog

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)
image

Copilot AI review requested due to automatic review settings May 17, 2026 21:25
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support paths for ModelOpt/QAT MoE conversion behavior in Megatron Bridge, especially around local expert naming, quantized module detection, and related unit coverage.

Changes:

  • Extends expert-number extraction and sorting to handle local_experts.<N> names.
  • Updates conversion task filtering and EP name-globalization paths for adapter/quantizer handling.
  • Registers quantized parallel module type detection and adds QAT bridge support tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unit_tests/models/test_qat_bridge_support.py Adds unit coverage for local expert naming, quantized mappings, adapter filtering, and ModelOpt layer spec exposure.
src/megatron/bridge/utils/common_utils.py Extends expert number extraction for local expert parameter names.
src/megatron/bridge/models/conversion/utils.py Extends deterministic sort keys for SequentialMLP local experts.
src/megatron/bridge/models/conversion/peft_bridge.py Adds shared base-weight export skip logic for adapters and quantizer params.
src/megatron/bridge/models/conversion/param_mapping.py Registers quantized parallel linear module names and generalizes fused LayerNorm-column detection.
src/megatron/bridge/models/conversion/model_bridge.py Updates local-to-global expert name handling and applies the new skip predicate in conversion task construction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/megatron/bridge/models/conversion/peft_bridge.py Outdated
Comment thread src/megatron/bridge/models/conversion/model_bridge.py Outdated
@HollowMan6
Copy link
Copy Markdown
Member Author

/claude review

Comment thread src/megatron/bridge/models/conversion/model_bridge.py Outdated
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 17, 2026

Review — Support for modelopt with MoE QAT

Overall this looks solid. The separation of grouped-expert (weight N / bias N) vs. local-expert (local_experts.N) naming is clean and well-tested.

What I checked

  • model_bridge.py: The new is_local_expert_param / is_grouped_expert_param split correctly routes EP name globalization. The regex avoids matching quantizer params like weight_quantizer._amax. The replace(..., 1) count-limit on the local_experts branch prevents accidental double-replacement.
  • param_mapping.py: QuantColumnParallelLinear / QuantRowParallelLinear registry additions are straightforward. The substring-based _FUSED_LAYER_NORM_COLUMN_PARALLEL_SUBSTRING check is a good generalization.
  • utils.py: extract_sort_key correctly falls back to local_experts.N when no (weight|bias)N suffix is found.
  • common_utils.py: Pattern ordering in extract_expert_number_from_param is correct — local_experts is most specific and checked first.
  • Test file: 11 focused unit tests cover the key paths including adapter exclusion, quantizer param passthrough, EP gather with local experts, and the new parallelism detection for Quant modules.

Minor inline comment

  • One readability nit on operator precedence (not x in y -> x not in y) posted inline.

Suggested test cases

No perf tests impacted.

Refer to verl-project/verl@b96c8fb

Signed-off-by: Hollow Man <hollowman@opensuse.org>
@HollowMan6
Copy link
Copy Markdown
Member Author

/claude review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 17, 2026

Code Review: Support for modelopt with MoE QAT

The changes cleanly extend expert param handling to support SequentialMLP local_experts.<N> naming alongside the existing grouped expert weight<N>/bias<N> naming, and add Quant*ParallelLinear types to the module registry. The regex work in _megatron_local_name_to_global is careful about avoiding false positives on quantizer params like weight_quantizer._amax.

Observations

  1. Test coverage is solid. The new test file covers all the key code paths: local expert mapping, grouped expert mapping, adapter exclusion, quantizer exclusion, EP gather, parallelism type detection, and registry contents.

  2. _FUSED_LAYER_NORM_COLUMN_PARALLEL_SUBSTRING is a good generalization. Switching from an exact class-name set to a substring check future-proofs against any *LayerNormColumnParallelLinear variant without needing registry updates.

  3. Minor: _detect_parallelism_type test only covers the non-modelopt path. The test at line 159 creates a plain type(...) module, so is_modelopt_dynamic_module returns False. In the real modelopt scenario, the module is dynamic and get_original_cls_by_level(level=0).__name__ is used. The substring check works either way, but a test exercising the dynamic-module branch would strengthen confidence. Not a blocker.

  4. gather_from_ep_ranks regex note (param_mapping.py:769-770). The pre-existing re.sub(r"experts\.(\d+)", ...) on HF param names works because HF params use experts.N (not local_experts.N). Worth noting this would need updating if any HF model family adopts local_experts naming.

Suggested test cases

No perf tests impacted.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

@yaoyu-33 yaoyu-33 added area:quant Quantization (PTQ, QAT, FP8 recipes) feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer labels May 17, 2026
@yaoyu-33 yaoyu-33 merged commit a75738a into NVIDIA-NeMo:main May 20, 2026
101 checks passed
@HollowMan6 HollowMan6 deleted the qat_patch branch May 20, 2026 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:quant Quantization (PTQ, QAT, FP8 recipes) feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants