[models] refactor: Remove size-specific provider classes by yaoyu-33 · Pull Request #3854 · NVIDIA-NeMo/Megatron-Bridge

yaoyu-33 · 2026-05-15T22:27:08Z

Summary

Remove public size-specific provider subclasses across Gemma, Gemma2, Gemma3, GPT-175B, Mistral Small, Ministral3, Nemotron VL, and Wan.
Update recipes, exports, docs, and tests to configure base providers directly instead of importing size-specific subclasses.
Add a unit guard that fails if provider class names encode model sizes.

Validation

uv run pre-commit run --all-files passed.
git diff --cached --check passed before commit.
Targeted pytest was attempted, but this workstation environment cannot run it: the project .venv lock treats torch as externally provided, and the active torch environment fails Bridge import because transformer_engine is missing and also reports a local CUDA driver mismatch. No full test suite was run.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-05-15T22:27:23Z

/ok to test 527a44d

copy-pr-bot · 2026-05-15T22:27:23Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

claude · 2026-05-15T22:31:07Z


    # Model configuration
-    model_cfg = WanModelProvider1_3B()
+    model_cfg = WanModelProvider()


Bug: Unlike wan_14b_pretrain_config which passes explicit architecture values, the 1.3B recipe relies on WanModelProvider() base-class defaults matching the 1.3B architecture. If anyone later changes the base-class defaults (e.g. to match a different default size), this recipe silently breaks.

For consistency and safety, pass the 1.3B architecture values explicitly, same as the 14B recipe does:

Suggested change

model_cfg = WanModelProvider()

model_cfg = WanModelProvider(

num_layers=30,

hidden_size=1536,

ffn_hidden_size=8960,

num_attention_heads=12,

crossattn_emb_size=1536,

seq_length=1024,

)

Fixed in f38d4fa687c9e07904b9350687bac90788ae2d8f: wan_1_3b_pretrain_config() now passes the 1.3B architecture values explicitly to WanModelProvider, matching the old WanModelProvider1_3B defaults. I also extended the Wan recipe test to assert crossattn_emb_size and seq_length.

Validation:

uv run pre-commit run --all-files passed.

cw job 11790766 passed: 15 passed, 33 warnings in 2.66s.

Log: /lustre/fsw/portfolios/coreai/projects/coreai_dlalgo_llm/users/yuya/MB-Codex-6-remove-size-providers-test/logs/pr3854-wan_11790766.log

claude · 2026-05-15T22:35:35Z

Review: [models] refactor: Remove size-specific provider classes

Clean refactor — the removals are consistent, tests are updated, docs are refreshed, and the new AST-based guard test is a nice touch to prevent regression.

Issues

wan_1_3b_pretrain_config relies on base-class defaults instead of explicit values — The 14B recipe correctly passes explicit architecture fields to WanModelProvider(...), but the 1.3B recipe uses bare WanModelProvider() and depends on the base-class defaults happening to match the 1.3B architecture. If someone later changes the base-class defaults, the 1.3B recipe silently breaks. See inline comment for a suggested fix.

Observations

The NemotronVL rename from NemotronNano12Bv2VLModelProvider to NemotronVLModelProvider is clean — all src, test, and init references are updated consistently.
The guard test regex correctly distinguishes model-family numbers (e.g. Ministral3, Gemma3) from size suffixes (1B, 175B, 1_3B).
No stale references to removed class names remain in src/, tests/, docs/, or scripts/.
compare_provider_configs import removed from mistral test is fine — still used by olmoe tests.
No perf/recipe configs under scripts/performance/configs/ are touched.

Suggested test cases

No perf tests impacted.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-05-15T23:59:11Z

Fixed the unit failures from Launch_Unit_Tests_Core.

Root cause:

test_no_size_specific_provider_classes.py was matching digits immediately before the M in ModelProvider, so versioned family names like T5ModelProvider, Gemma2ModelProvider, and Ministral3ModelProvider were false positives.
gpt3_175b_pretrain_config() moved from the removed GPTProvider175B subclass to GPTModelProvider, but kept the old bias_dropout_add_fusion keyword. Current MCore/Bridge config uses bias_dropout_fusion.

Fix summary:

Tightened the provider-size detector to require explicit size-token boundaries and added positive/negative coverage for version numbers vs. model sizes.
Updated the GPT3-175B recipe to pass bias_dropout_fusion=True.

Validation:

uv run pre-commit run --all-files passed.
cw Slurm job 11788239 passed: 30 passed, 34 warnings in 4.47s.
Log: /lustre/fsw/portfolios/coreai/projects/coreai_dlalgo_llm/users/yuya/MB-Codex-6-remove-size-providers-test/logs/pr3854-unit5_11788239.log
Command: uv run --active --no-sync python -m pytest tests/unit_tests/models/test_no_size_specific_provider_classes.py tests/unit_tests/recipes/gpt/test_gpt3_175b.py -v

Pushed commit: 3535066f30e00fdfbcea2aa751c3a610e376f453.

yaoyu-33 · 2026-05-15T23:59:15Z

/ok to test 3535066

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-05-16T04:05:04Z

/ok to test f38d4fa

[models] refactor: Remove size-specific provider classes

527a44d

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot Bot temporarily deployed to public May 15, 2026 22:28 Inactive

copy-pr-bot Bot temporarily deployed to test May 15, 2026 22:28 Inactive

claude Bot reviewed May 15, 2026

View reviewed changes

copy-pr-bot Bot temporarily deployed to public May 15, 2026 22:35 Inactive

copy-pr-bot Bot temporarily deployed to public May 15, 2026 22:36 Inactive

copy-pr-bot Bot temporarily deployed to public May 15, 2026 22:50 Inactive

yaoyu-33 added area:model Model implementations and HF bridge logic breaking-change Public behavior or API compatibility changes feature New capabilities, enhancements, or enablement work full-test-suite waiting-on-maintainers Waiting on maintainers to respond labels May 15, 2026

[models] fix: Address provider cleanup unit failures

3535066

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot Bot temporarily deployed to public May 15, 2026 23:59 Inactive

copy-pr-bot Bot temporarily deployed to test May 16, 2026 00:00 Inactive

copy-pr-bot Bot temporarily deployed to public May 16, 2026 00:07 Inactive

copy-pr-bot Bot temporarily deployed to public May 16, 2026 00:22 Inactive

[diffusion] fix: Make Wan 1.3B recipe explicit

f38d4fa

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot Bot temporarily deployed to public May 16, 2026 04:05 Inactive

copy-pr-bot Bot temporarily deployed to test May 16, 2026 04:05 Inactive

copy-pr-bot Bot temporarily deployed to public May 16, 2026 04:14 Inactive

copy-pr-bot Bot temporarily deployed to public May 16, 2026 04:30 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[models] refactor: Remove size-specific provider classes#3854

[models] refactor: Remove size-specific provider classes#3854
yaoyu-33 wants to merge 3 commits into
mainfrom
dispatch/remove-size-specific-providers

yaoyu-33 commented May 15, 2026

Uh oh!

yaoyu-33 commented May 15, 2026

Uh oh!

copy-pr-bot Bot commented May 15, 2026

Uh oh!

claude Bot May 15, 2026

Uh oh!

yaoyu-33 May 16, 2026

Uh oh!

claude Bot commented May 15, 2026

Uh oh!

yaoyu-33 commented May 15, 2026

Uh oh!

yaoyu-33 commented May 15, 2026

Uh oh!

yaoyu-33 commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-    model_cfg = WanModelProvider()
+    model_cfg = WanModelProvider(
+        num_layers=30,
+        hidden_size=1536,
+        ffn_hidden_size=8960,
+        num_attention_heads=12,
+        crossattn_emb_size=1536,
+        seq_length=1024,
+    )

Conversation

yaoyu-33 commented May 15, 2026

Summary

Validation

Uh oh!

yaoyu-33 commented May 15, 2026

Uh oh!

copy-pr-bot Bot commented May 15, 2026

Uh oh!

claude Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 May 16, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented May 15, 2026

Review: [models] refactor: Remove size-specific provider classes

Issues

Observations

Suggested test cases

No perf tests impacted.

Uh oh!

yaoyu-33 commented May 15, 2026

Uh oh!

yaoyu-33 commented May 15, 2026

Uh oh!

yaoyu-33 commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant