Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Fix bugs in initial_load_in_hf when enable_weight_tying=true in Qwen3
Add checks for weight tying in state_dict processing
  • Loading branch information
Achazwl authored Oct 29, 2025
commit d57d3a0806fec893dea911315f1781c8871dc10b
10 changes: 10 additions & 0 deletions torchtitan/models/qwen3/model/state_dict_adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,9 @@ def to_hf(self, state_dict: dict[str, Any]) -> dict[str, Any]:
else:
if key not in to_hf_map:
continue
# Skip output.weight if weight tying is enabled (HF checkpoint won't have lm_head.weight)
if self.model_args.enable_weight_tying and key == "output.weight":
Copy link
Contributor

@wwwjn wwwjn Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By checking 0.6B and 1.7B model weights, they do have separate weights for embed_tokens and lm_head, and I assumed these 2 weight are the same (please correct me if I am wrong), so loading the same weights twice are ok here.

1.7B: https://huggingface.co/Qwen/Qwen3-1.7B/blob/main/model.safetensors.index.json
0.6B: https://huggingface.co/Qwen/Qwen3-0.6B/blob/main/model.safetensors

I see your change makes sense. Our previous code will fail when loading the 4B model weights: 4B model doesn't have "lm_head.weight" in their checkpoint files, but our translated hf_state_dict will still have key lm_head.weight. Did you verified the updated code still on par with HF forward? cc @shuhuayu

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for catching the bug when loading the qwen3 4b model. I did a forward parity check, it works well.
Image

continue
new_key = to_hf_map[key]
hf_state_dict[new_key] = value

Expand All @@ -118,6 +121,13 @@ def from_hf(self, hf_state_dict: dict[str, Any]) -> dict[str, Any]:
state_dict = {}
expert_weights_by_layer = {} # {layer: {abstract_key: {expert_id: tensor}}}

# If weight tying is enabled and lm_head.weight is not in HF checkpoint,
# copy from embed_tokens.weight
if self.model_args.enable_weight_tying and "lm_head.weight" not in hf_state_dict:
if "model.embed_tokens.weight" in hf_state_dict:
hf_state_dict = dict(hf_state_dict) # Make a copy to avoid modifying original
Copy link
Contributor

@wwwjn wwwjn Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to make a shallow copy of the dict? Can you elaborate more on "avoid modify original"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without dict(hf_state_dict), the line hf_state_dict["lm_head.weight"] = ... would directly mutate the dictionary object provided by the caller function. I'm not sure if the caller expects the input dictionary to be modified, so I made a copy to avoid any potential side effects.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not necessary, revert this line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the input hf_state_dict will not be used after calling from_hf() function:

state_dict = self.sd_adapter.from_hf(hf_state_dict)

It should be mutate the dictionary object (hf_state_dict)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the input hf_state_dict will not be used after calling from_hf() function:

state_dict = self.sd_adapter.from_hf(hf_state_dict)

It should be mutate the dictionary object (hf_state_dict)

Ok, I've removed the shallow copy.

hf_state_dict["lm_head.weight"] = hf_state_dict["model.embed_tokens.weight"]

for key, value in hf_state_dict.items():
if "mlp.experts" in key:
abstract_key = re.sub(r"(\d+)", "{}", key, count=2)
Expand Down