[RFC] Do we really need all_reduce in `BaseLossContext`?

Currently in loss calculation, there is an invocation of all-reduce with autograd:
https://github.com/InternLM/xtuner/blob/7b1246a3e3fcb156a6c579859061be78d8d520b7/xtuner/v1/loss/base_loss_ctx.py#L156

IIUC this all-reduce is primarily for logging purposes. It could be potentially be removed safely (plz correct me if I'm wrong here), bc:

1. FSDP works fine with just local loss, and 
2. with the above reduced loss, during backward stage the gradient would be identical as local_loss; 
3. loss would reduced in `TrainEngine` (thus it would be reduced twice!)
https://github.com/InternLM/xtuner/blob/7b1246a3e3fcb156a6c579859061be78d8d520b7/xtuner/v1/engine/train_engine.py#L337

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Do we really need all_reduce in `BaseLossContext`? #1335

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Do we really need all_reduce in BaseLossContext? #1335

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[RFC] Do we really need all_reduce in `BaseLossContext`? #1335