[models] refactor: Remove size-specific provider classes#3854
Conversation
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
|
/ok to test 527a44d |
|
|
||
| # Model configuration | ||
| model_cfg = WanModelProvider1_3B() | ||
| model_cfg = WanModelProvider() |
There was a problem hiding this comment.
Bug: Unlike wan_14b_pretrain_config which passes explicit architecture values, the 1.3B recipe relies on WanModelProvider() base-class defaults matching the 1.3B architecture. If anyone later changes the base-class defaults (e.g. to match a different default size), this recipe silently breaks.
For consistency and safety, pass the 1.3B architecture values explicitly, same as the 14B recipe does:
| model_cfg = WanModelProvider() | |
| model_cfg = WanModelProvider( | |
| num_layers=30, | |
| hidden_size=1536, | |
| ffn_hidden_size=8960, | |
| num_attention_heads=12, | |
| crossattn_emb_size=1536, | |
| seq_length=1024, | |
| ) |
There was a problem hiding this comment.
Fixed in f38d4fa687c9e07904b9350687bac90788ae2d8f: wan_1_3b_pretrain_config() now passes the 1.3B architecture values explicitly to WanModelProvider, matching the old WanModelProvider1_3B defaults. I also extended the Wan recipe test to assert crossattn_emb_size and seq_length.
Validation:
uv run pre-commit run --all-filespassed.cwjob11790766passed:15 passed, 33 warnings in 2.66s.- Log:
/lustre/fsw/portfolios/coreai/projects/coreai_dlalgo_llm/users/yuya/MB-Codex-6-remove-size-providers-test/logs/pr3854-wan_11790766.log
Review: [models] refactor: Remove size-specific provider classesClean refactor — the removals are consistent, tests are updated, docs are refreshed, and the new AST-based guard test is a nice touch to prevent regression. Issueswan_1_3b_pretrain_config relies on base-class defaults instead of explicit values — The 14B recipe correctly passes explicit architecture fields to WanModelProvider(...), but the 1.3B recipe uses bare WanModelProvider() and depends on the base-class defaults happening to match the 1.3B architecture. If someone later changes the base-class defaults, the 1.3B recipe silently breaks. See inline comment for a suggested fix. Observations
Suggested test casesNo perf tests impacted. |
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
|
Fixed the unit failures from Root cause:
Fix summary:
Validation:
Pushed commit: |
|
/ok to test 3535066 |
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
|
/ok to test f38d4fa |
Summary
Validation
uv run pre-commit run --all-filespassed.git diff --cached --checkpassed before commit..venvlock treatstorchas externally provided, and the active torch environment fails Bridge import becausetransformer_engineis missing and also reports a local CUDA driver mismatch. No full test suite was run.