[RFC][WIP][CP] Enable FlexAttention CP for llama3 #1857

fegin · 2025-10-12T05:28:54Z

Stack from ghstack (oldest at bottom):

This PR uses the latest CP APIs to enable FlexAttention + CP for llama3. This PR removes the usage of context_paralle() context manager and use _context_parallel_shard() to shard the input data.

[ghstack-poisoned]

This PR uses the latest CP APIs to enable FlexAttention + CP for llama3. This PR removes the usage of context_paralle() context manager and use `_context_parallel_shard()` to shard the input data. ghstack-source-id: d30bc9f Pull-Request: #1857

This PR uses the latest CP APIs to enable FlexAttention + CP for llama3. This PR removes the usage of context_paralle() context manager and use `_context_parallel_shard()` to shard the input data. ghstack-source-id: d30bc9f Pull-Request: #1857 [ghstack-poisoned]

… llama3" This PR uses the latest CP APIs to enable FlexAttention + CP for llama3. This PR removes the usage of context_paralle() context manager and use `_context_parallel_shard()` to shard the input data. Pull-Request: #1857 [ghstack-poisoned]

This PR uses the latest CP APIs to enable FlexAttention + CP for llama3. This PR removes the usage of context_paralle() context manager and use `_context_parallel_shard()` to shard the input data. Pull-Request: #1857 [ghstack-poisoned]

[ghstack-poisoned]

This PR uses the latest CP APIs to enable FlexAttention + CP for llama3. This PR removes the usage of context_paralle() context manager and use `_context_parallel_shard()` to shard the input data. ghstack-source-id: 5d04d61 Pull-Request: #1857

[ghstack-poisoned]

This PR uses the latest CP APIs to enable FlexAttention + CP for llama3. This PR removes the usage of context_paralle() context manager and use `_context_parallel_shard()` to shard the input data. ghstack-source-id: 673d743 Pull-Request: #1857

[ghstack-poisoned]

This PR uses the latest CP APIs to enable FlexAttention + CP for llama3. This PR removes the usage of context_paralle() context manager and use `_context_parallel_shard()` to shard the input data. ghstack-source-id: 1bff8da Pull-Request: #1857

Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.12.0) (oldest at bottom): * #1857 * __->__ #1939 TorchTitan doesn't need compiled_autograd, which is meant to support compiled DDP, but TorchTitan will adopt fully_shard-based replicate. Let's remove it.

fegin · 2025-10-30T01:07:29Z

torchtitan/train.py

            yield input_dict, labels

-    def forward_backward_step(
+    def post_dataloader_step(


This method.

We are adding more actions to convert the raw inputs and label. 1. The new CP can do the input/label/BlockMask sharding this in this method. 2. The experimental full dtensor model can simply override this method without changing too many Trainer code. This method is extracted from #1857 Makeing this a standalone PR allows us to continue the two projects above without one blocks another. ghstack-source-id: d1882a7 Pull-Request: #1985

Update

bc6af9b

[ghstack-poisoned]

fegin requested review from tianyu-l, wconstab and wwwjn as code owners October 12, 2025 05:28

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 12, 2025

fegin mentioned this pull request Oct 12, 2025

[RFC] Lift freqs_cis as an input of models #1797

Closed

XilunWu mentioned this pull request Oct 15, 2025

[RFC][WIP][CP] Enable FlexAttention CP for llama3 #1883

Draft

Update

e6d7374

[ghstack-poisoned]

fegin mentioned this pull request Oct 27, 2025

Remove the unused compiled_autograd option #1939

Merged

Update

8b3a18d

[ghstack-poisoned]

Update

6867d4a

[ghstack-poisoned]

fegin commented Oct 30, 2025

View reviewed changes

torchtitan/train.py

yield input_dict, labels

def forward_backward_step(

def post_dataloader_step(

Copy link

Contributor Author

fegin Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method.

fegin mentioned this pull request Nov 4, 2025

Add post_dataloading_processing method to Trainer #1985

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC][WIP][CP] Enable FlexAttention CP for llama3 #1857

[RFC][WIP][CP] Enable FlexAttention CP for llama3 #1857

Uh oh!

fegin commented Oct 12, 2025 •

edited

Loading

Uh oh!

fegin Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[RFC][WIP][CP] Enable FlexAttention CP for llama3 #1857

Are you sure you want to change the base?

[RFC][WIP][CP] Enable FlexAttention CP for llama3 #1857

Uh oh!

Conversation

fegin commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fegin Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fegin commented Oct 12, 2025 •

edited

Loading