Skip to content

Conversation

@fegin
Copy link
Contributor

@fegin fegin commented Oct 12, 2025

Stack from ghstack (oldest at bottom):

This PR uses the latest CP APIs to enable FlexAttention + CP for llama3. This PR removes the usage of context_paralle() context manager and use _context_parallel_shard() to shard the input data.

[ghstack-poisoned]
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 12, 2025
fegin added a commit that referenced this pull request Oct 12, 2025
This PR uses the latest CP APIs to enable FlexAttention + CP for llama3.  This PR removes the usage of context_paralle() context manager and use `_context_parallel_shard()` to shard the input data.


ghstack-source-id: d30bc9f
Pull-Request: #1857
XilunWu added a commit that referenced this pull request Oct 15, 2025
This PR uses the latest CP APIs to enable FlexAttention + CP for llama3.  This PR removes the usage of context_paralle() context manager and use `_context_parallel_shard()` to shard the input data.


ghstack-source-id: d30bc9f
Pull-Request: #1857

[ghstack-poisoned]
XilunWu added a commit that referenced this pull request Oct 16, 2025
… llama3"

This PR uses the latest CP APIs to enable FlexAttention + CP for llama3.  This PR removes the usage of context_paralle() context manager and use `_context_parallel_shard()` to shard the input data.


Pull-Request: #1857

[ghstack-poisoned]
XilunWu added a commit that referenced this pull request Oct 16, 2025
This PR uses the latest CP APIs to enable FlexAttention + CP for llama3.  This PR removes the usage of context_paralle() context manager and use `_context_parallel_shard()` to shard the input data.


Pull-Request: #1857

[ghstack-poisoned]
[ghstack-poisoned]
fegin added a commit that referenced this pull request Oct 27, 2025
This PR uses the latest CP APIs to enable FlexAttention + CP for llama3.  This PR removes the usage of context_paralle() context manager and use `_context_parallel_shard()` to shard the input data.

ghstack-source-id: 5d04d61
Pull-Request: #1857
[ghstack-poisoned]
fegin added a commit that referenced this pull request Oct 28, 2025
This PR uses the latest CP APIs to enable FlexAttention + CP for llama3.  This PR removes the usage of context_paralle() context manager and use `_context_parallel_shard()` to shard the input data.

ghstack-source-id: 673d743
Pull-Request: #1857
[ghstack-poisoned]
fegin added a commit that referenced this pull request Oct 28, 2025
This PR uses the latest CP APIs to enable FlexAttention + CP for llama3.  This PR removes the usage of context_paralle() context manager and use `_context_parallel_shard()` to shard the input data.

ghstack-source-id: 1bff8da
Pull-Request: #1857
fegin added a commit that referenced this pull request Oct 28, 2025
Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.12.0)
(oldest at bottom):
* #1857
* __->__ #1939

TorchTitan doesn't need compiled_autograd, which is meant to support
compiled DDP, but TorchTitan will adopt fully_shard-based replicate.
Let's remove it.
yield input_dict, labels

def forward_backward_step(
def post_dataloader_step(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method.

fegin added a commit that referenced this pull request Nov 4, 2025
We are adding more actions to convert the raw inputs and label.

1. The new CP can do the input/label/BlockMask sharding this in this method.
2. The experimental full dtensor model can simply override this method without changing too many Trainer code.

This method is extracted from #1857

Makeing this a standalone PR allows us to continue the two projects above without one blocks another.


ghstack-source-id: d1882a7
Pull-Request: #1985
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants