Fix bugs in initial_load_in_hf when enable_weight_tying=true in Qwen3 #1964

Achazwl · 2025-10-29T12:25:43Z

Add checks for weight tying in state_dict processing

meta-cla · 2025-10-29T12:25:50Z

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

meta-cla · 2025-10-29T13:19:02Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

wwwjn · 2025-10-29T17:45:19Z

torchtitan/models/qwen3/model/state_dict_adapter.py

                if key not in to_hf_map:
                    continue
+                # Skip output.weight if weight tying is enabled (HF checkpoint won't have lm_head.weight)
+                if self.model_args.enable_weight_tying and key == "output.weight":


By checking 0.6B and 1.7B model weights, they do have separate weights for embed_tokens and lm_head, and I assumed these 2 weight are the same (please correct me if I am wrong), so loading the same weights twice are ok here.

1.7B: https://huggingface.co/Qwen/Qwen3-1.7B/blob/main/model.safetensors.index.json
0.6B: https://huggingface.co/Qwen/Qwen3-0.6B/blob/main/model.safetensors

I see your change makes sense. Our previous code will fail when loading the 4B model weights: 4B model doesn't have "lm_head.weight" in their checkpoint files, but our translated hf_state_dict will still have key lm_head.weight. Did you verified the updated code still on par with HF forward? cc @shuhuayu

Thx for catching the bug when loading the qwen3 4b model. I did a forward parity check, it works well.

wwwjn · 2025-10-29T18:04:40Z

torchtitan/models/qwen3/model/state_dict_adapter.py

+        # copy from embed_tokens.weight
+        if self.model_args.enable_weight_tying and "lm_head.weight" not in hf_state_dict:
+            if "model.embed_tokens.weight" in hf_state_dict:
+                hf_state_dict = dict(hf_state_dict)  # Make a copy to avoid modifying original


Do you need to make a shallow copy of the dict? Can you elaborate more on "avoid modify original"?

Without dict(hf_state_dict), the line hf_state_dict["lm_head.weight"] = ... would directly mutate the dictionary object provided by the caller function. I'm not sure if the caller expects the input dictionary to be modified, so I made a copy to avoid any potential side effects.

If not necessary, revert this line.

Thanks, the input hf_state_dict will not be used after calling from_hf() function:

torchtitan/torchtitan/components/checkpoint.py

Line 457 in a3e170c

state_dict = self.sd_adapter.from_hf(hf_state_dict)

It should be mutate the dictionary object (hf_state_dict)

Thanks, the input hf_state_dict will not be used after calling from_hf() function:

torchtitan/torchtitan/components/checkpoint.py

Line 457 in a3e170c

state_dict = self.sd_adapter.from_hf(hf_state_dict)

It should be mutate the dictionary object (hf_state_dict)

Ok, I've removed the shallow copy.

torchtitan/models/qwen3/model/state_dict_adapter.py

shuhuayu

Thx for fixing this bug! lgtm.

Co-authored-by: Shuhua Yu <[email protected]>

wwwjn · 2025-11-03T19:37:19Z

Hi @Achazwl Thanks for contribution! Do you wanna fix the lint and run CI before again before we can merge it?

Fix bugs in initial_load_in_hf when enable_weight_tying=true in Qwen3

d57d3a0

Add checks for weight tying in state_dict processing

Achazwl requested review from fegin, tianyu-l, wconstab and wwwjn as code owners October 29, 2025 12:25

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 29, 2025

tianyu-l requested a review from shuhuayu October 29, 2025 17:27

wwwjn reviewed Oct 29, 2025

View reviewed changes

shuhuayu reviewed Oct 29, 2025

View reviewed changes

torchtitan/models/qwen3/model/state_dict_adapter.py Outdated Show resolved Hide resolved

shuhuayu reviewed Oct 29, 2025

View reviewed changes

torchtitan/models/qwen3/model/state_dict_adapter.py Outdated Show resolved Hide resolved

shuhuayu reviewed Oct 29, 2025

View reviewed changes

Achazwl and others added 3 commits October 30, 2025 09:01

Update torchtitan/models/qwen3/model/state_dict_adapter.py

a005f0f

Co-authored-by: Shuhua Yu <[email protected]>

Update torchtitan/models/qwen3/model/state_dict_adapter.py

978746f

Co-authored-by: Shuhua Yu <[email protected]>

Remove the shadow copy of hf_state_dict

63d5b5f

fix lint

8a23f6e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix bugs in initial_load_in_hf when enable_weight_tying=true in Qwen3 #1964

Fix bugs in initial_load_in_hf when enable_weight_tying=true in Qwen3 #1964

Achazwl commented Oct 29, 2025

Uh oh!

meta-cla bot commented Oct 29, 2025

Uh oh!

meta-cla bot commented Oct 29, 2025

Uh oh!

wwwjn Oct 29, 2025 •

edited

Loading

Uh oh!

shuhuayu Oct 29, 2025

Uh oh!

wwwjn Oct 29, 2025 •

edited

Loading

Uh oh!

Achazwl Oct 30, 2025

Uh oh!

Achazwl Oct 30, 2025

Uh oh!

wwwjn Oct 30, 2025

Uh oh!

Achazwl Oct 31, 2025

Uh oh!

Uh oh!

Uh oh!

shuhuayu left a comment

Uh oh!

wwwjn commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix bugs in initial_load_in_hf when enable_weight_tying=true in Qwen3 #1964

Are you sure you want to change the base?

Fix bugs in initial_load_in_hf when enable_weight_tying=true in Qwen3 #1964

Conversation

Achazwl commented Oct 29, 2025

Uh oh!

meta-cla bot commented Oct 29, 2025

Action Required

Process

Uh oh!

meta-cla bot commented Oct 29, 2025

Uh oh!

wwwjn Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shuhuayu Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

wwwjn Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Achazwl Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Achazwl Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

wwwjn Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Achazwl Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shuhuayu left a comment

Choose a reason for hiding this comment

Uh oh!

wwwjn commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wwwjn Oct 29, 2025 •

edited

Loading

wwwjn Oct 29, 2025 •

edited

Loading