Skip to content

Conversation

@3outeille
Copy link
Contributor

@3outeille 3outeille commented Dec 15, 2025

This fixes: huggingface#6

Thanks to do that we can do torch.compile + 4D-//ism on HF model (cf huggingface#5)

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 15, 2025
@3outeille 3outeille changed the title Upgrade transformers from 4.57.1 to 5.0.0rc0 [transformers_modeling_backend] Upgrade transformers from 4.57.1 to 5.0.0rc0 Dec 15, 2025
@3outeille
Copy link
Contributor Author

Upgrading to transformers v5 fixes it as it no longer uses kwargs for self.attn #2154

@3outeille 3outeille closed this Dec 15, 2025
@3outeille 3outeille reopened this Dec 15, 2025
@3outeille 3outeille requested a review from tianyu-l December 15, 2025 18:31
Copy link
Contributor

@wwwjn wwwjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@3outeille
Copy link
Contributor Author

fixed linting

@3outeille 3outeille requested a review from wwwjn December 16, 2025 22:05
if module.padding_idx is not None:
module.weight.data[module.padding_idx].zero_()
if isinstance(module.weight.data, DTensor):
module.weight.data._local_tensor[module.padding_idx].zero_()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I probably didn't what you are doing here.
If the padding is on the "global tensor", we should just do the same thing module.weight.data[module.padding_idx].zero_()

The code here is doing local modification, which may or may not be correct depending on if the padding_idx is meant to be local or global.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HF modeling with torch.compile doenst work when used with Tensor Parallel

3 participants