Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update torchtitan/models/qwen3/model/state_dict_adapter.py
Co-authored-by: Shuhua Yu <[email protected]>
  • Loading branch information
Achazwl and shuhuayu authored Oct 30, 2025
commit 978746fac0a1f041c21d3b6aab96660a56bfceef
1 change: 0 additions & 1 deletion torchtitan/models/qwen3/model/state_dict_adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,6 @@ def to_hf(self, state_dict: dict[str, Any]) -> dict[str, Any]:
else:
if key not in to_hf_map:
continue
# Skip output.weight if weight tying is enabled (HF checkpoint won't have lm_head.weight)
if self.model_args.enable_weight_tying and key == "output.weight":
Copy link
Contributor

@wwwjn wwwjn Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By checking 0.6B and 1.7B model weights, they do have separate weights for embed_tokens and lm_head, and I assumed these 2 weight are the same (please correct me if I am wrong), so loading the same weights twice are ok here.

1.7B: https://huggingface.co/Qwen/Qwen3-1.7B/blob/main/model.safetensors.index.json
0.6B: https://huggingface.co/Qwen/Qwen3-0.6B/blob/main/model.safetensors

I see your change makes sense. Our previous code will fail when loading the 4B model weights: 4B model doesn't have "lm_head.weight" in their checkpoint files, but our translated hf_state_dict will still have key lm_head.weight. Did you verified the updated code still on par with HF forward? cc @shuhuayu

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for catching the bug when loading the qwen3 4b model. I did a forward parity check, it works well.
Image

continue
new_key = to_hf_map[key]
Expand Down
Loading