-
Notifications
You must be signed in to change notification settings - Fork 394
Open
Description
Currently in loss calculation, there is an invocation of all-reduce with autograd:
xtuner/xtuner/v1/loss/base_loss_ctx.py
Line 156 in 7b1246a
| loss = all_reduce(loss, op=dist.ReduceOp.SUM, group=dist.group.WORLD) |
IIUC this all-reduce is primarily for logging purposes. It could be potentially be removed safely (plz correct me if I'm wrong here), bc:
- FSDP works fine with just local loss, and
- with the above reduced loss, during backward stage the gradient would be identical as local_loss;
- loss would reduced in
TrainEngine(thus it would be reduced twice!)
xtuner/xtuner/v1/engine/train_engine.py
Line 337 in 7b1246a
dist.all_reduce(reduced_llm_loss.div_(dist.get_world_size()))
Metadata
Metadata
Assignees
Labels
No labels