Skip to content
Draft
Changes from 1 commit
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
91e7c8a
disable compute_stream when data parallel
L1aoXingyu May 25, 2022
fa82eac
change input and label placmenet at the beginning
L1aoXingyu May 25, 2022
64e3307
add env variable
L1aoXingyu May 25, 2022
82e39f7
globa tensor to local in graph build
L1aoXingyu May 25, 2022
9ddc9a2
finish softmax fusion
L1aoXingyu May 25, 2022
7cff1a0
turn off evaluation
L1aoXingyu May 25, 2022
f6df295
* add fused_scale_mask_softmax_dropout in bert and t5
L1aoXingyu May 25, 2022
14c2231
using graph block set stage by placement
chengtbf May 26, 2022
718e9ba
fix multihead_fusion loss non-decreasing
L1aoXingyu May 27, 2022
3709b6d
Merge branch 'lxy_libai_bench' of https://github.com/Oneflow-Inc/liba…
L1aoXingyu May 27, 2022
f0d2199
Add all set stage for libai models: resmlp, swin-t, t5, vit
chengtbf May 27, 2022
4ea8f5f
Merge branch 'lxy_libai_bench' of https://github.com/Oneflow-Inc/liba…
chengtbf May 27, 2022
d53fde5
remove expend by broadcast softmax dropout
chengtbf May 27, 2022
c91e39b
pull master and fix conflict
chengtbf May 31, 2022
c2ddf10
Merge branch 'lxy_libai_bench' of github.com:Oneflow-Inc/libai into l…
chengtbf May 31, 2022
90706bd
fix sbp for 2-D in loss cls_head attention all gather and all2all
chengtbf Jun 2, 2022
258dba7
change pipeline_num_layers in dist (#296)
CPFLAME Jun 7, 2022
31e5c40
Merge branch 'main' of github.com:Oneflow-Inc/libai into lxy_libai_bench
chengtbf Jun 7, 2022
87e10cf
fuse optimizer and fp16 cast
chengtbf Jun 8, 2022
2ebab25
disable fuse optim cast for zzk correctness bug fix
chengtbf Jun 13, 2022
1c30c50
del casual mask in gpt init && use fused tri in attention
CPFLAME Jun 13, 2022
85c188e
Merge branch 'lxy_libai_bench' of github.com:Oneflow-Inc/libai into l…
CPFLAME Jun 13, 2022
8fafd11
init rdma after dataloader
chengtbf Jun 13, 2022
c851591
refine zero config in graph base
chengtbf Jun 13, 2022
7001a2c
refine rdma && add persistent_workers in dataloader
CPFLAME Jun 14, 2022
6230697
delete rdma in graph trainer
CPFLAME Jun 14, 2022
4eb6efd
Merge branch 'main' into lxy_libai_bench
chengtbf Jun 17, 2022
e0efe7d
fix import dist
chengtbf Jun 18, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
change pipeline_num_layers in dist (#296)
  • Loading branch information
CPFLAME authored Jun 7, 2022
commit 258dba7bd550c256c5fbacfeab928b90caea1e18
11 changes: 11 additions & 0 deletions libai/utils/distributed.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,17 @@ def _init_placement_group(self, cfg):
for i in range(0, self.world_size, num_devices_per_stage)
]

# change pipeline_num_layers to make the middle stages contain more layers
if (
self._pipeline_parallel_size >= 4
and cfg.pipeline_num_layers >= 8
and cfg.pipeline_num_layers % self._pipeline_parallel_size == 0
):
temp_num_layers_per_stage = cfg.pipeline_num_layers // self._pipeline_parallel_size
cfg.pipeline_num_layers += min(
self._pipeline_parallel_size - 1, temp_num_layers_per_stage
)

num_layers_per_stage = cfg.pipeline_num_layers // self._pipeline_parallel_size
stage_offset = cfg.pipeline_num_layers % self._pipeline_parallel_size

Expand Down