Adding prefetching of first shards to train script when fsdp enabled by chelsea0x3b · Pull Request #1955 · pytorch/torchtitan

chelsea0x3b · 2025-10-28T17:46:00Z

If model is sharded calling .unshard() will prefetch the first shard. I placed this before the data loader & other preprocessing so it should overlap.

Sources:

Issuing 1st all-gather earlier: Implicit prefetching happens at the time of calling model(x). The 1st all-gather gets exposed. We can call model.unshard() explicitly earlier to issue 1st all-gather earlier

tianyu-l · 2025-10-28T22:13:49Z

torchtitan/train.py

+            if self.parallel_dims.fsdp_enabled:
+                # NOTE: prefetches the model
+                self.model_parts[0].unshard(async_op=True)


Curious how much benefit do we get from this? Could you show some traces?

I have the following concerns:

When we don't do logging (where getting loss to CPU incurs d2h sync), GPU is ahead of CPU and can already overlap with dataloading, so this is not saving anything.

In torchtitan there're other FSDP implementation, in in general we should avoid FSDP2-only code. There's a way around by testing if model_parts[0] is an FSDPModule, but it's not so clean.

Let's see if the benefit justifies the complexity. WDYT?

Ah yeah turns out there is little benefit, you're right! I think i had logging higher when I was noticing an improvement. Will close, ty!

chelsea0x3b added 2 commits October 28, 2025 17:41

Adding fsdp prefetching to train script

258a731

Making unshard a async_op

efaf3c7

chelsea0x3b requested review from fegin, tianyu-l, wconstab and wwwjn as code owners October 28, 2025 17:46

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 28, 2025

tianyu-l requested changes Oct 28, 2025

View reviewed changes

chelsea0x3b closed this Oct 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding prefetching of first shards to train script when fsdp enabled#1955

Adding prefetching of first shards to train script when fsdp enabled#1955
chelsea0x3b wants to merge 2 commits intopytorch:mainfrom
chelsea0x3b:fsdp-prefetching

chelsea0x3b commented Oct 28, 2025

Uh oh!

tianyu-l Oct 28, 2025

Uh oh!

chelsea0x3b Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chelsea0x3b commented Oct 28, 2025

Uh oh!

tianyu-l Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

chelsea0x3b Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants