Skip to content

Add Param2 model bridge#3834

Open
meghmak13 wants to merge 1 commit into
mainfrom
megh/param2
Open

Add Param2 model bridge#3834
meghmak13 wants to merge 1 commit into
mainfrom
megh/param2

Conversation

@meghmak13
Copy link
Copy Markdown

What does this PR do ?

This PR add model bridge and provide files for Param-2 17B
Add a one line overview of what this PR aims to accomplish.

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 14, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@@ -0,0 +1,446 @@
import contextlib
import fnmatch
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing NVIDIA copyright header. Per project rules, all new Python files (except under tests/) must include the Apache 2.0 copyright header with year 2026. Same applies to param_17B_provider.py.

Comment on lines +158 to +160
from megatron.bridge.models.param2 import (
Param2ModelProvider
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting issues: 2-space indent (should be 4), trailing whitespace after Param2ModelProvider, and missing trailing comma. Also, Param2Bridge should be imported here to match the pattern of other model bridges.

Suggested change
from megatron.bridge.models.param2 import (
Param2ModelProvider
)
from megatron.bridge.models.param2 import (
Param2Bridge,
Param2ModelProvider,
)

Comment on lines +354 to +358
f"produced by build_conversion_tasks()."
)

return filtered

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: moe_router_enable_expert_bias defaults to False in provider_bridge() (line 80) but to True in megatron_to_hf_config() here, and True in the provider dataclass. This asymmetry means an HF→Megatron→HF roundtrip will silently flip this value from False to True when the source config doesn't set it explicitly. The default should be consistent — likely True to match the provider class.

Suggested change
f"produced by build_conversion_tasks()."
)
return filtered
"moe_router_enable_expert_bias": bool(
getattr(provider, "moe_router_enable_expert_bias", False)
),

Or change the default in provider_bridge() to True to match the provider dataclass — pick one and be consistent.

Comment on lines +322 to +325
),
AutoMapping(
megatron_param=f"decoder.layers.{layer}.mlp.experts.linear_fc2.weight{expert}",
hf_param=f"model.layers.{layer}.mlp.experts.{expert}.down_proj.weight",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Project rules prohibit bare print() — use logging.getLogger(__name__) or print_rank_0(). There's another bare print() at line 409 with the same issue.

Comment on lines +20 to +22
__all__ = [
"Param2ModelProvider",
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Param2Bridge is imported (line 14) but not exported in __all__. Other model packages (DeepSeek, OlMoE, Sarvam, etc.) export both the bridge and provider.

Suggested change
__all__ = [
"Param2ModelProvider",
]
__all__ = [
"Param2Bridge",
"Param2ModelProvider",
]

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 14, 2026

Review: Add Param2 model bridge

Thanks for the contribution! Found a few issues that should be addressed before merge:

Critical

  1. Roundtrip bug -- moe_router_enable_expert_bias default mismatch: provider_bridge() defaults this to False, but megatron_to_hf_config() defaults it to True, and the provider dataclass also defaults to True. An HF-to-Megatron-to-HF roundtrip will silently flip this value when the source config does not set it. Pick one default and use it consistently.

Must fix

  1. Missing NVIDIA copyright headers: Both param_17B_bridge.py and param_17B_provider.py are missing the required Apache 2.0 copyright header (year 2026 per project rules).

  2. Bare print() calls: Two uses of bare print() in param_17B_bridge.py (lines 322 and 409). Project rules require logging.getLogger(name) or print_rank_0() instead.

  3. Malformed import in models/init.py: The Param2 import block uses 2-space indent, has trailing whitespace, and is missing a trailing comma. See inline suggestion.

  4. Param2Bridge not exported: Param2Bridge is imported in param2/init.py but missing from all. It is also not imported or listed in the top-level models/init.py. Other model families export both bridge and provider.

Observations

  • No unit tests are included for the new bridge/provider. At minimum, a weight-mapping roundtrip test would help catch issues like the default mismatch above.
  • The bridge hardcodes architecture constants (NUM_LAYERS = 21, NUM_EXPERTS = 64, etc.) as class attributes rather than reading from the provider/config. This works for a single model variant but will need refactoring if other Param2 sizes are added.

Suggested test cases

No perf tests impacted.

@yaoyu-33 yaoyu-33 added area:model Model implementations and HF bridge logic feature New capabilities, enhancements, or enablement work needs-more-tests Requires additional L0 and L1 test coverage before merge needs-review PR is ready for code review and waiting on a reviewer waiting-on-maintainers Waiting on maintainers to respond and removed needs-review PR is ready for code review and waiting on a reviewer labels May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:model Model implementations and HF bridge logic feature New capabilities, enhancements, or enablement work needs-more-tests Requires additional L0 and L1 test coverage before merge waiting-on-maintainers Waiting on maintainers to respond

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants