[DTensor Bugfix] Explicitly specify grad_placements in to_local to ensure necessary all reduce takes place by acisseJZhong · Pull Request #2532 · pytorch/torchtitan

acisseJZhong · 2026-03-09T23:24:02Z

Problem:

x.to_local() was called on a multi-dimensional DTensor (e.g., on a (dp, tp) mesh with placements like (Shard(0), Replicate())). The bare to_local() strips all mesh dimensions and loses gradient placement info for the non-DP (TP) dimensions. For a Replicate() RMSNorm weight on the TP mesh, the backward should produce Partial() gradients (requiring all-reduce), but this information was lost.

Fix:

Extract the non-DP placements from x
Compute the corresponding grad placements — Replicate → Partial(), others stay as-is
Pass the full grad placements (DP dims + non-DP dims) to to_local()

This ensures that in the backward pass, gradients flowing back through to_local() are properly wrapped as a DTensor with Partial() on the TP mesh dimension, which will trigger the necessary all-reduce across TP ranks.

Authored with Claude.

acisseJZhong added 3 commits March 9, 2026 16:16

fixt

36a3418

fix

c013289

remove unrelated file

4fbe1e3

pytorch-bot bot added the ciflow/8gpu label Mar 9, 2026

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 9, 2026

acisseJZhong requested review from sanketpurandare and tianyu-l March 9, 2026 23:25

tianyu-l linked an issue Mar 9, 2026 that may be closed by this pull request

Understanding RMSNorm Gradient Synchronization in Tensor-Parallel LLaMA #2217

Open

add todo

748fef4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DTensor Bugfix] Explicitly specify grad_placements in to_local to ensure necessary all reduce takes place #2532

[DTensor Bugfix] Explicitly specify grad_placements in to_local to ensure necessary all reduce takes place #2532
acisseJZhong wants to merge 4 commits intomainfrom
fix_to_local

acisseJZhong commented Mar 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

acisseJZhong commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem:

Fix:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

acisseJZhong commented Mar 9, 2026 •

edited

Loading