Skip to content

[models] refactor: Remove size-specific provider classes#3854

Open
yaoyu-33 wants to merge 3 commits into
mainfrom
dispatch/remove-size-specific-providers
Open

[models] refactor: Remove size-specific provider classes#3854
yaoyu-33 wants to merge 3 commits into
mainfrom
dispatch/remove-size-specific-providers

Conversation

@yaoyu-33
Copy link
Copy Markdown
Contributor

Summary

  • Remove public size-specific provider subclasses across Gemma, Gemma2, Gemma3, GPT-175B, Mistral Small, Ministral3, Nemotron VL, and Wan.
  • Update recipes, exports, docs, and tests to configure base providers directly instead of importing size-specific subclasses.
  • Add a unit guard that fails if provider class names encode model sizes.

Validation

  • uv run pre-commit run --all-files passed.
  • git diff --cached --check passed before commit.
  • Targeted pytest was attempted, but this workstation environment cannot run it: the project .venv lock treats torch as externally provided, and the active torch environment fails Bridge import because transformer_engine is missing and also reports a local CUDA driver mismatch. No full test suite was run.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test 527a44d

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 15, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.


# Model configuration
model_cfg = WanModelProvider1_3B()
model_cfg = WanModelProvider()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Unlike wan_14b_pretrain_config which passes explicit architecture values, the 1.3B recipe relies on WanModelProvider() base-class defaults matching the 1.3B architecture. If anyone later changes the base-class defaults (e.g. to match a different default size), this recipe silently breaks.

For consistency and safety, pass the 1.3B architecture values explicitly, same as the 14B recipe does:

Suggested change
model_cfg = WanModelProvider()
model_cfg = WanModelProvider(
num_layers=30,
hidden_size=1536,
ffn_hidden_size=8960,
num_attention_heads=12,
crossattn_emb_size=1536,
seq_length=1024,
)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f38d4fa687c9e07904b9350687bac90788ae2d8f: wan_1_3b_pretrain_config() now passes the 1.3B architecture values explicitly to WanModelProvider, matching the old WanModelProvider1_3B defaults. I also extended the Wan recipe test to assert crossattn_emb_size and seq_length.

Validation:

  • uv run pre-commit run --all-files passed.
  • cw job 11790766 passed: 15 passed, 33 warnings in 2.66s.
  • Log: /lustre/fsw/portfolios/coreai/projects/coreai_dlalgo_llm/users/yuya/MB-Codex-6-remove-size-providers-test/logs/pr3854-wan_11790766.log

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 15, 2026


Review: [models] refactor: Remove size-specific provider classes

Clean refactor — the removals are consistent, tests are updated, docs are refreshed, and the new AST-based guard test is a nice touch to prevent regression.

Issues

wan_1_3b_pretrain_config relies on base-class defaults instead of explicit values — The 14B recipe correctly passes explicit architecture fields to WanModelProvider(...), but the 1.3B recipe uses bare WanModelProvider() and depends on the base-class defaults happening to match the 1.3B architecture. If someone later changes the base-class defaults, the 1.3B recipe silently breaks. See inline comment for a suggested fix.

Observations

  • The NemotronVL rename from NemotronNano12Bv2VLModelProvider to NemotronVLModelProvider is clean — all src, test, and init references are updated consistently.
  • The guard test regex correctly distinguishes model-family numbers (e.g. Ministral3, Gemma3) from size suffixes (1B, 175B, 1_3B).
  • No stale references to removed class names remain in src/, tests/, docs/, or scripts/.
  • compare_provider_configs import removed from mistral test is fine — still used by olmoe tests.
  • No perf/recipe configs under scripts/performance/configs/ are touched.

Suggested test cases

No perf tests impacted.

@yaoyu-33 yaoyu-33 added area:model Model implementations and HF bridge logic breaking-change Public behavior or API compatibility changes feature New capabilities, enhancements, or enablement work full-test-suite waiting-on-maintainers Waiting on maintainers to respond labels May 15, 2026
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

Fixed the unit failures from Launch_Unit_Tests_Core.

Root cause:

  • test_no_size_specific_provider_classes.py was matching digits immediately before the M in ModelProvider, so versioned family names like T5ModelProvider, Gemma2ModelProvider, and Ministral3ModelProvider were false positives.
  • gpt3_175b_pretrain_config() moved from the removed GPTProvider175B subclass to GPTModelProvider, but kept the old bias_dropout_add_fusion keyword. Current MCore/Bridge config uses bias_dropout_fusion.

Fix summary:

  • Tightened the provider-size detector to require explicit size-token boundaries and added positive/negative coverage for version numbers vs. model sizes.
  • Updated the GPT3-175B recipe to pass bias_dropout_fusion=True.

Validation:

  • uv run pre-commit run --all-files passed.
  • cw Slurm job 11788239 passed: 30 passed, 34 warnings in 4.47s.
  • Log: /lustre/fsw/portfolios/coreai/projects/coreai_dlalgo_llm/users/yuya/MB-Codex-6-remove-size-providers-test/logs/pr3854-unit5_11788239.log
  • Command: uv run --active --no-sync python -m pytest tests/unit_tests/models/test_no_size_specific_provider_classes.py tests/unit_tests/recipes/gpt/test_gpt3_175b.py -v

Pushed commit: 3535066f30e00fdfbcea2aa751c3a610e376f453.

@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test 3535066

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33
Copy link
Copy Markdown
Contributor Author

/ok to test f38d4fa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:model Model implementations and HF bridge logic breaking-change Public behavior or API compatibility changes feature New capabilities, enhancements, or enablement work full-test-suite waiting-on-maintainers Waiting on maintainers to respond

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant