Add Param2 model bridge#3834
Conversation
| @@ -0,0 +1,446 @@ | |||
| import contextlib | |||
| import fnmatch | |||
There was a problem hiding this comment.
Missing NVIDIA copyright header. Per project rules, all new Python files (except under tests/) must include the Apache 2.0 copyright header with year 2026. Same applies to param_17B_provider.py.
| from megatron.bridge.models.param2 import ( | ||
| Param2ModelProvider | ||
| ) |
There was a problem hiding this comment.
Formatting issues: 2-space indent (should be 4), trailing whitespace after Param2ModelProvider, and missing trailing comma. Also, Param2Bridge should be imported here to match the pattern of other model bridges.
| from megatron.bridge.models.param2 import ( | |
| Param2ModelProvider | |
| ) | |
| from megatron.bridge.models.param2 import ( | |
| Param2Bridge, | |
| Param2ModelProvider, | |
| ) |
| f"produced by build_conversion_tasks()." | ||
| ) | ||
|
|
||
| return filtered | ||
|
|
There was a problem hiding this comment.
Bug: moe_router_enable_expert_bias defaults to False in provider_bridge() (line 80) but to True in megatron_to_hf_config() here, and True in the provider dataclass. This asymmetry means an HF→Megatron→HF roundtrip will silently flip this value from False to True when the source config doesn't set it explicitly. The default should be consistent — likely True to match the provider class.
| f"produced by build_conversion_tasks()." | |
| ) | |
| return filtered | |
| "moe_router_enable_expert_bias": bool( | |
| getattr(provider, "moe_router_enable_expert_bias", False) | |
| ), |
Or change the default in provider_bridge() to True to match the provider dataclass — pick one and be consistent.
| ), | ||
| AutoMapping( | ||
| megatron_param=f"decoder.layers.{layer}.mlp.experts.linear_fc2.weight{expert}", | ||
| hf_param=f"model.layers.{layer}.mlp.experts.{expert}.down_proj.weight", |
There was a problem hiding this comment.
Project rules prohibit bare print() — use logging.getLogger(__name__) or print_rank_0(). There's another bare print() at line 409 with the same issue.
| __all__ = [ | ||
| "Param2ModelProvider", | ||
| ] |
There was a problem hiding this comment.
Param2Bridge is imported (line 14) but not exported in __all__. Other model packages (DeepSeek, OlMoE, Sarvam, etc.) export both the bridge and provider.
| __all__ = [ | |
| "Param2ModelProvider", | |
| ] | |
| __all__ = [ | |
| "Param2Bridge", | |
| "Param2ModelProvider", | |
| ] |
Review: Add Param2 model bridgeThanks for the contribution! Found a few issues that should be addressed before merge: Critical
Must fix
Observations
Suggested test cases No perf tests impacted. |
What does this PR do ?
This PR add model bridge and provide files for Param-2 17B
Add a one line overview of what this PR aims to accomplish.
Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information