-
Notifications
You must be signed in to change notification settings - Fork 595
Fix bugs in initial_load_in_hf when enable_weight_tying=true in Qwen3 #1964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
d57d3a0
a005f0f
978746f
63d5b5f
8a23f6e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
Add checks for weight tying in state_dict processing
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -104,6 +104,9 @@ def to_hf(self, state_dict: dict[str, Any]) -> dict[str, Any]: | |||||
| else: | ||||||
| if key not in to_hf_map: | ||||||
| continue | ||||||
| # Skip output.weight if weight tying is enabled (HF checkpoint won't have lm_head.weight) | ||||||
| if self.model_args.enable_weight_tying and key == "output.weight": | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By checking 0.6B and 1.7B model weights, they do have separate weights for embed_tokens and lm_head, and I assumed these 2 weight are the same (please correct me if I am wrong), so loading the same weights twice are ok here. 1.7B: https://huggingface.co/Qwen/Qwen3-1.7B/blob/main/model.safetensors.index.json I see your change makes sense. Our previous code will fail when loading the 4B model weights: 4B model doesn't have "lm_head.weight" in their checkpoint files, but our translated There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
| continue | ||||||
| new_key = to_hf_map[key] | ||||||
| hf_state_dict[new_key] = value | ||||||
|
|
||||||
|
|
@@ -118,6 +121,13 @@ def from_hf(self, hf_state_dict: dict[str, Any]) -> dict[str, Any]: | |||||
| state_dict = {} | ||||||
| expert_weights_by_layer = {} # {layer: {abstract_key: {expert_id: tensor}}} | ||||||
|
|
||||||
| # If weight tying is enabled and lm_head.weight is not in HF checkpoint, | ||||||
| # copy from embed_tokens.weight | ||||||
Achazwl marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
| if self.model_args.enable_weight_tying and "lm_head.weight" not in hf_state_dict: | ||||||
| if "model.embed_tokens.weight" in hf_state_dict: | ||||||
| hf_state_dict = dict(hf_state_dict) # Make a copy to avoid modifying original | ||||||
|
||||||
| state_dict = self.sd_adapter.from_hf(hf_state_dict) |
It should be mutate the dictionary object (hf_state_dict)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the input
hf_state_dictwill not be used after calling from_hf() function:
state_dict = self.sd_adapter.from_hf(hf_state_dict) It should be mutate the dictionary object (
hf_state_dict)
Ok, I've removed the shallow copy.

Uh oh!
There was an error while loading. Please reload this page.