Representing DTensor in thunder traces by kshitij12345 · Pull Request #1907 · Lightning-AI/lightning-thunder

kshitij12345 · 2025-03-26T10:53:36Z

Design Doc - https://docs.google.com/document/d/1Gqb_jXrL-sSqs-D8KrZdcQinxuUSlccZBnnvbYJfYl0/edit?usp=sharing

Changes -
This PR adds support for DTensor inputs to the jitted function. Most of the additions required to support DTensor are present in thunder/torch/experimental like the DTensorProxy, related prims.

NOTE:

This PR just adds the basic infrastructure to be able to run a simple DTensor program (with torch.mul and no broadcast). Coverage will be followed in subsequent PRs.
thunderfx path has failure currently (we add a test asserting that). Will be fixed in a separate PR.

Following are the main updates:

Prologue: Adds a new primitive check_dtensor_spec_repr which will match the repr of DTensorSpec of the DTensor in question (see the example below). PR also makes sure that besices the DTensorSpec there is tensor metadata check for the DTensor object as well as for the local tensor that it points to. NOTE - Other option for checking DTensorSpec would be to keep the inputs DTensorSpec in the TracingContext and prologue could verify for equality.
DTensorProxy: Adds a new Proxy object to represent the DTensor. This class inherits from TensorProxy as DTensor is a tensor subclass and implements all the same methods that a tensor implements.
Prims and Operations: For computation trace, we add prims and torch level operations for DTensor. We add new prims and operations instead of re-using the existing ones to prevent the executors from claiming an operation on DTensor by-mistake.
Representation in trace -

Example Program

from torch.distributed.tensor import DTensor
from torch.distributed import init_device_mesh
import torch
import os

os.environ["RANK"] = "0"
os.environ["LOCAL_RANK"] = "0"
os.environ["WORLD_SIZE"] = "1"
os.environ["MASTER_ADDR"] = "localhost"
os.environ["MASTER_PORT"] = "29500"
mesh = init_device_mesh("cuda", (1,), mesh_dim_names=["i"])

x_dtensor = DTensor.from_local(torch.randn(2, 2), device_mesh=mesh)
w_dtensor = DTensor.from_local(torch.randn(2, 2), device_mesh=mesh)

import thunder

@thunder.jit
def fn(x, w):
    return x * w

fn(x_dtensor, w_dtensor)

Prologue Trace (relevant snippet)

# print(fn._lc_cs.last_prologue_traces[-1])
@torch.no_grad()
@no_autocast
def prologue(*args, **kwargs):
  # args: "Any"
  prims.check_len(args, 2)
  # kwargs: "Any"
  prims.check_len(kwargs, 0)
  l_x_: "DTensor cuda:0 f32[16, 16]" = args[0]
  l_w_: "DTensor cuda:0 f32[16, 16]" = args[1]
  dtensor_spec0: "<class 'NoneType'>" = l_x_._spec
  thunder.torch.experimental.dtensor_prims_and_impl.check_dtensor_spec_repr(dtensor_spec0, "DTensorSpec(mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),), tensor_meta=TensorMeta(shape=torch.Size([16, 16]), stride=(16, 1), dtype=torch.float32))")
  t1: "cuda:0 f32[8, 16]" = l_x_._local_tensor
  prims.check_tensor_shape_and_metadata(t1, (8, 16), 'cuda:0', torch.float32, True)
  prims.check_tensor_shape_and_metadata(l_x_, (16, 16), 'cuda:0', torch.float32, True)
  dtensor_spec2: "<class 'NoneType'>" = l_w_._spec
  thunder.torch.experimental.dtensor_prims_and_impl.check_dtensor_spec_repr(dtensor_spec2, "DTensorSpec(mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),), tensor_meta=TensorMeta(shape=torch.Size([16, 16]), stride=(16, 1), dtype=torch.float32))")
  t3: "cuda:0 f32[8, 16]" = l_w_._local_tensor
  prims.check_tensor_shape_and_metadata(t3, (8, 16), 'cuda:0', torch.float32, False)
  prims.check_tensor_shape_and_metadata(l_w_, (16, 16), 'cuda:0', torch.float32, False)

Computation Trace : There is a torch level symbol dtensor_mul which is decomposed into prims for DTensor operations.

# print(fn._lc_cs.last_traces[0])
@torch.no_grad()
@no_autocast
def computation(x, w):
  # x: "DTensor cuda:0 f32[16, 16] mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),)"
  # w: "DTensor cuda:0 f32[16, 16] mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),)"

  # /opt/pytorch/lightning-thunder/test_dtensor.py:21: 	    return torch.mul(x, w)
  dtensor_6 = thunder.torch.experimental.dtensor_torch_and_prims.dtensor_mul(x, w)  # dtensor_6: "DTensor cuda:0 f32[16, 16] mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),)"
    # dtensor_6 = thunder.torch.experimental.dtensor_torch_and_prims.dtensor_mul_prim(x, w)  # dtensor_6: "DTensor cuda:0 f32[16, 16] mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),)"
  return (dtensor_6,)

Backward Trace (initial trace)

# print(fn._lc_cs.last_backward_traces[0])
@torch.no_grad()
@no_autocast
def backward_fn(saved_for_backward, cotangents):
  # saved_for_backward: "Collection"
  # cotangents: "Collection"
  C0, _, = saved_for_backward
  # C0: "Collection"
  # None
  clear_mutable_collection(saved_for_backward)
  del saved_for_backward
  dtensor_0, = cotangents
  # dtensor_0: "DTensor cuda:0 f32[16, 16] mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),)"
  clear_mutable_collection(cotangents)
  del cotangents
  w, x, = C0
  # w: "DTensor cuda:0 f32[16, 16] mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),)"
  # x: "DTensor cuda:0 f32[16, 16] mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),)"
  clear_mutable_collection(C0)
  del C0
  bw_dtensor_19 = dtensor_mul_prim(w, dtensor_0)  # bw_dtensor_19: "DTensor cuda:0 f32[16, 16] mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),)"
    # bw_dtensor_19 = thunder.torch.experimental.dtensor_torch_and_prims.dtensor_mul_prim(w, dtensor_0)  # bw_dtensor_19: "DTensor cuda:0 f32[16, 16] mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),)"
  del w
  bw_dtensor_22 = dtensor_mul_prim(x, dtensor_0)  # bw_dtensor_22: "DTensor cuda:0 f32[16, 16] mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),)"
    # bw_dtensor_22 = thunder.torch.experimental.dtensor_torch_and_prims.dtensor_mul_prim(x, dtensor_0)  # bw_dtensor_22: "DTensor cuda:0 f32[16, 16] mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),)"
  del x, dtensor_0
  return (bw_dtensor_19, bw_dtensor_22)

Thank you Masaki, Ivan and Mike for the helpful discussions and guidance!

thunder/dynamo/utils.py

thunder/torch/__init__.py

thunder/torch/experimental/dtensor_prims_and_impl.py

thunder/torch/experimental/dtensor_proxy.py

Co-authored-by: Masaki Kozuki <mkozuki@nvidia.com>

… dtensor-init-support

This reverts commit 225f2e3.

… dtensor-init-support

kshitij12345 · 2025-06-10T13:16:24Z

Gentle ping @IvanYashchuk

…e file

thunder/core/codeutils.py

thunder/core/jit_ext.py

thunder/executors/torch_autograd.py

thunder/torch/experimental/dtensor_utils.py

thunder/__init__.py

thunder/core/codeutils.py

thunder/torch/experimental/dtensor_proxy.py

thunder/torch/experimental/dtensor_torch_and_prims.py

thunder/torch/experimental/dtensor_codeutils.py

thunder/torch/experimental/dtensor_torch_and_prims.py

thunder/torch/experimental/dtensor_utils.py

into dtensor-init-support

…5/lightning-thunder into dtensor-init-support

for more information, see https://pre-commit.ci

…5/lightning-thunder into dtensor-init-support

kshitij12345 · 2025-06-13T10:35:44Z

With the latest merge, I am seeing failure in the test for ConstantFolding transform. The error seems legit and not sure why it used to work before (as it should have happened previously as well).

Cause -
This happens as it relies on the internal map _torch_to_thunder_function_map (from torch_fn to Symbol) -

lightning-thunder/thunder/transforms/constant_folding.py

Line 22 in 0802746

    
           _thunder_to_torch_function_map = {v: k for k, v in _torch_to_thunder_function_map.items()}

But this PR updates this map to be torch_fn to Callable which dispatches to correct Symbol -

lightning-thunder/thunder/torch/experimental/dtensor_torch_and_prims.py

Lines 49 to 55 in f688c98

    
           def register_function_for_dtensor(torch_fn, single_device_symbol, dtensor_symbol, is_method=False): 
        
               register_function(torch_fn, dispatch_to_impl(single_device_symbol, dtensor_symbol)) 
        
               if is_method: 
        
                   method_name: str = torch_fn.__name__ 
        
                   torch_method: None | Callable = getattr(torch.Tensor, method_name, None) 
        
                   register_method_for_dtensor(torch_method, single_device_symbol, dtensor_symbol)

Workaround -
I have a fix in mind for ConstantFolding to not rely on this. So, I think we should xfail the test in this PR and I will send a follow-up PR to get the ConstantFolding test passing again.

cc: @IvanYashchuk

for more information, see https://pre-commit.ci

kshitij12345 · 2025-06-13T14:49:13Z

Ping @t-vi for stamp

t-vi

Thank you @kshitij12345 @IvanYashchuk @crcrpar

t-vi · 2025-06-13T15:17:51Z

@kshitij12345 you will file an issue about USE_DISTRIBUTED=OFF ?

kshitij12345 · 2025-06-13T15:32:01Z

Opened #2233 for tracking that thunder will work with PyTorch compiled without distributed.

kshitij12345 added 12 commits March 21, 2025 20:28

dtensor support

5873742

add comment

377125a

add more comments

7ab82f6

update comment

e6aa8d3

add test for execpted failing cases

e76fc17

support for method

eaac9f7

update failing case test

94ef69d

remove generated traces

5d81851

undo pre-commit change

7277753

undo debug changes

a8c58e4

update failing test to use thunder.jit

d87b103

update registration helper

b101161

crcrpar reviewed Mar 27, 2025

View reviewed changes

kshitij12345 and others added 3 commits March 31, 2025 18:55

Apply suggestions from code review

b551cb8

Co-authored-by: Masaki Kozuki <mkozuki@nvidia.com>

Merge branch 'main' of github.com:Lightning-AI/lightning-thunder into…

1c75a80

… dtensor-init-support

address review and upadte

5854c86

IvanYashchuk added the DTensor Issues about DTensor support in Thunder label Apr 2, 2025

kshitij12345 added 13 commits April 2, 2025 14:26

update dtensor proxy repr

a778830

Merge branch 'main' of github.com:Lightning-AI/lightning-thunder into…

41990d0

… dtensor-init-support

Merge branch 'main' of github.com:Lightning-AI/lightning-thunder into…

eda0277

… dtensor-init-support

Merge branch 'main' of github.com:Lightning-AI/lightning-thunder into…

8abf040

… dtensor-init-support

update jit_ext access to torchfn_to_thunder registry : test

225f2e3

empty commit

2b85b31

Revert "update jit_ext access to torchfn_to_thunder registry : test"

5d0296f

This reverts commit 225f2e3.

temp commit

dedab03

Merge branch 'main' of github.com:Lightning-AI/lightning-thunder into…

efaae1d

… dtensor-init-support

update to manual decomp

ddcf208

add manual grad rule

6a6bf11

update

2a8ea02

update - clean-up

9490cac

undo changes for thunderfx path, fix later

e285e97

IvanYashchuk added 9 commits June 11, 2025 13:04

Move import of is_dtensor_proxy to the top of the file

284d109

Move import of check_dtensor_spec_repr to the top of the file

b26e8b7

format dtensor imports

a7eed41

Move import of handle_check_dtensor_spec_in_prologue to the top of th…

8bd1b99

…e file

Return only fake tensors from run_with_fake_tensor

8cad766

Remove unused aot_function

f4dc375

Remove TracingContext, tracing usage

c8459a2

Inline FakeTensorMode into _run_with_fake

2cf4c39

Rename _run_with_fake->run_with_fake_tensor

1949011

IvanYashchuk approved these changes Jun 11, 2025

View reviewed changes

thunder/core/codeutils.py Outdated Show resolved Hide resolved

thunder/core/jit_ext.py Outdated Show resolved Hide resolved

thunder/executors/torch_autograd.py Outdated Show resolved Hide resolved

thunder/torch/experimental/dtensor_utils.py Show resolved Hide resolved

crcrpar reviewed Jun 11, 2025

View reviewed changes

kshitij12345 and others added 6 commits June 12, 2025 08:22

Merge branch 'main' of https://github.com/Lightning-AI/lightning-thunder

fdfbb79

into dtensor-init-support

address review

031eba0

Merge branch 'dtensor-init-support' of https://github.com/kshitij1234…

d600791

…5/lightning-thunder into dtensor-init-support

[pre-commit.ci] auto fixes from pre-commit.com hooks

384b7c6

for more information, see https://pre-commit.ci

update

8bc5228

Merge branch 'dtensor-init-support' of https://github.com/kshitij1234…

f688c98

…5/lightning-thunder into dtensor-init-support

kshitij12345 and others added 2 commits June 13, 2025 03:39

xfail constant_folding test

27a4c26

[pre-commit.ci] auto fixes from pre-commit.com hooks

9291923

for more information, see https://pre-commit.ci

t-vi approved these changes Jun 13, 2025

View reviewed changes

t-vi enabled auto-merge (squash) June 13, 2025 15:15

t-vi merged commit d665072 into Lightning-AI:main Jun 13, 2025
49 checks passed

kshitij12345 mentioned this pull request Jun 13, 2025

DTensor: Verify thunder can be imported even if PyTorch without distributed is used. #2233

Closed

github-actions bot deleted the dtensor-init-support branch September 13, 2025 00:47

Conversation

kshitij12345 commented Mar 26, 2025 • edited by IvanYashchuk Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kshitij12345 commented Jun 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kshitij12345 commented Jun 13, 2025

Uh oh!

kshitij12345 commented Jun 13, 2025

Uh oh!

t-vi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

t-vi commented Jun 13, 2025

Uh oh!

kshitij12345 commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kshitij12345 commented Mar 26, 2025 •

edited by IvanYashchuk

Loading

kshitij12345 commented Jun 13, 2025 •

edited

Loading