Skip to content

[simple_fsdp] apply bucketing ag/rs passes, reordering collectives, sink#1464

Closed
IvanKobzarev wants to merge 1 commit intomainfrom
simplefsdp_passes
Closed

[simple_fsdp] apply bucketing ag/rs passes, reordering collectives, sink#1464
IvanKobzarev wants to merge 1 commit intomainfrom
simplefsdp_passes

Conversation

@IvanKobzarev
Copy link
Contributor

Depends on landing pytorch PR
pytorch/pytorch#158663

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 25, 2025
@IvanKobzarev IvanKobzarev requested a review from wconstab July 25, 2025 16:28
logger.info("Applied Data Parallel (dp mode=%s) to the model", dp_mode)

if job_config.training.compile:
from torch._inductor.comms import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you don't need to import. you can pass string names of the passes instead, e.g. passes = ["sink_waits_iterative", ...]

sink_waits_iterative,
)

torch._inductor.config.allow_buffer_reuse = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think since this will change the behavior of compile for non-simplefsdp cases, we need to understand the impact before we can land. Or, we can introduce a new option for compile_and_optimize_comms that bundles all these things together.

In my autoparallel branch I preferred to expose the manual controls directly, mostly to allow experimentation. But for general consumption i think it is nicer to enable the right 'recipe' with a simple switch, so i like your approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think since this will change the behavior of compile for non-simplefsdp cases

@wconstab
This is in the simple_fsdp folder for SimpleFSDP case alone. Why do you think it would change the behavior of non-simplefsdp cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, i was just not paying attention to which file this was, i assumed it was the core parallelize file. disregard my comment.


torch._inductor.config.bucket_all_gathers_fx = "fsdp"
torch._inductor.config.bucket_reduce_scatters_fx = "fsdp"
torch._inductor.config.reorder_for_compute_comm_overlap = True
Copy link
Contributor

@ruisizhang123 ruisizhang123 Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this. It would be nice if you could also update the readme and detail what these configs are doing!

Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see inline comments. I think it'd good if we can keep the passes configurable.

sink_waits_iterative,
)

torch._inductor.config.allow_buffer_reuse = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think since this will change the behavior of compile for non-simplefsdp cases

@wconstab
This is in the simple_fsdp folder for SimpleFSDP case alone. Why do you think it would change the behavior of non-simplefsdp cases?

Comment on lines +108 to +118
torch._inductor.config.allow_buffer_reuse = False
torch._inductor.config.reorder_for_peak_memory = False
torch._inductor.config.reorder_for_compute_comm_overlap = False

torch._inductor.config.bucket_all_gathers_fx = "fsdp"
torch._inductor.config.bucket_reduce_scatters_fx = "fsdp"
torch._inductor.config.reorder_for_compute_comm_overlap = True
torch._inductor.config.reorder_for_compute_comm_overlap_passes = [
sink_waits_iterative,
reorder_communication_preserving_peak_memory,
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make this configurable. Although I agree they can be turned on be default, I think it's still valuable if users can turn these passes off:

  1. Some researchers may appreciate a plain graph without reordering. This was one of the origin motivation to add this experiment here.
  2. The passes added here may not always be stable? If they cause failures, we should have the fallback to not enable them.

@IvanKobzarev
Copy link
Contributor Author

Please see inline comments. I think it'd good if we can keep the passes configurable.

Will already added options in config manager - will close this pr

@tianyu-l
Copy link
Contributor

@IvanKobzarev

Will already added options in config manager - will close this pr

Could you elaborate this a bit? I haven't seen @wconstab 's change in the simple_fsdp folder of torchtitan. Did he add it somewhere else which I'm not aware of?

Also for future reference, for such changes please consider presenting performance report in PR summary.

@IvanKobzarev
Copy link
Contributor Author

@IvanKobzarev

Will already added options in config manager - will close this pr

Could you elaborate this a bit? I haven't seen @wconstab 's change in the simple_fsdp folder of torchtitan. Did he add it somewhere else which I'm not aware of?

Also for future reference, for such changes please consider presenting performance report in PR summary.

Sorry, that was not to the main, that was only to autoparallel branch - 68476b3

So for simplefsdp porting this to main should work too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants