Add Support for GPT-2 Training on different Devices #551

ShawnXuan · 2024-09-10T10:27:39Z

Getting Started

Prepare the Data and Vocabulary

We have provided the relevant datasets, which can be downloaded from the following links:

Download the dataset and organize the files into a directory. The folder structure should look like this:

$ tree data
path/to/gpt_data
├── gpt2-vocab.json
├── gpt2-merges.txt
├── loss_compara_content_sentence.bin
└── loss_compara_content_sentence.idx

Ensure the correct location of gpt_data:

Option 1: Create a symbolic link

# Assuming the working directory is the root of the libai project
mkdir data_test
ln -s /path/to/gpt_data data_test/gpt_data
# ln -s /data0/datasets/gpt2 data_test/gpt_data # on our 910B

Option 2: Modify the configuration file configs/gpt2_pretrain.py

Adjust the following configuration based on your specific environment:

vocab_file = "./data_test/gpt_data/gpt2-vocab.json"
merge_files = "./data_test/gpt_data/gpt2-merges.txt"
data_prefix = "./data_test/gpt_data/loss_compara_content_sentence"

How to Train gpt2 Model with NPU/XPU

python3 -m oneflow.distributed.launch \
    --nproc_per_node 1 \
    --nnodes 1 \
    --node_rank 0 \
    --master_addr 127.0.0.1 \
    --master_port 12345 \
        tools/train_net.py --config-file=configs/gpt2_pretrain.py \
            graph.enabled=False \
            train.input_placement_device="npu" \
            train.dist.device_type="npu" \
            train.amp.enabled=False \
            model.cfg.scale_mask_softmax_fusion=False \
            model.cfg.bias_gelu_fusion=False

If you want to train on XPU, please change 'npu' to 'xpu'.

ShawnXuan · 2024-09-10T11:00:52Z

NPU(910B3)

[09/10 10:37:57 libai]: >>> done with building model. Building time: 0.282 seconds
WARNING [09/10 10:37:57 lb.scheduler.lr_scheduler]: warmup iters equals to zero, return CosineLR
[09/10 10:38:03 lb.engine.trainer]: Starting training from iteration 0
[09/10 10:40:56 lb.utils.events]:  eta: 21:00:38  iteration: 19/10000  consumed_samples: 80  total_loss: 9.895  time: 7.5187 s/iter  data_time: 0.0021 s/iter total_throughput: 0.53 samples/s lr: 1.50e-04
[09/10 10:43:32 lb.utils.events]:  eta: 21:05:47  iteration: 39/10000  consumed_samples: 160  total_loss: 9.027  time: 7.6572 s/iter  data_time: 0.0019 s/iter total_throughput: 0.52 samples/s lr: 1.50e-04
[09/10 10:46:05 lb.utils.events]:  eta: 21:06:05  iteration: 59/10000  consumed_samples: 240  total_loss: 8.362  time: 7.6549 s/iter  data_time: 0.0015 s/iter total_throughput: 0.52 samples/s lr: 1.50e-04
[09/10 10:48:42 lb.utils.events]:  eta: 21:08:55  iteration: 79/10000  consumed_samples: 320  total_loss: 7.847  time: 7.7127 s/iter  data_time: 0.0013 s/iter total_throughput: 0.52 samples/s lr: 1.50e-04
[09/10 10:51:22 lb.utils.events]:  eta: 21:18:52  iteration: 99/10000  consumed_samples: 400  total_loss: 7.628  time: 7.7640 s/iter  data_time: 0.0013 s/iter total_throughput: 0.52 samples/s lr: 1.50e-04
[09/10 10:53:53 lb.utils.events]:  eta: 21:04:10  iteration: 119/10000  consumed_samples: 480  total_loss: 7.441  time: 7.7314 s/iter  data_time: 0.0013 s/iter total_throughput: 0.52 samples/s lr: 1.50e-04

CUDA(A100)

[09/10 10:50:47 libai]: >>> done with building model. Building time: 5.722 seconds
WARNING [09/10 10:50:47 lb.scheduler.lr_scheduler]: warmup iters equals to zero, return CosineLR
[09/10 10:50:50 lb.engine.trainer]: Starting training from iteration 0
[09/10 10:50:54 lb.utils.events]:  eta: 0:10:15  iteration: 19/10000  consumed_samples: 80  total_loss: 9.83  time: 0.0689 s/iter  data_time: 0.0008 s/iter total_throughput: 58.05 samples/s lr: 1.50e-04
[09/10 10:50:58 lb.utils.events]:  eta: 0:10:15  iteration: 39/10000  consumed_samples: 160  total_loss: 9.122  time: 0.1458 s/iter  data_time: 0.0007 s/iter total_throughput: 27.43 samples/s lr: 1.50e-04
[09/10 10:51:00 lb.utils.events]:  eta: 0:10:12  iteration: 59/10000  consumed_samples: 240  total_loss: 8.388  time: 0.1214 s/iter  data_time: 0.0007 s/iter total_throughput: 32.94 samples/s lr: 1.50e-04
[09/10 10:51:03 lb.utils.events]:  eta: 0:10:11  iteration: 79/10000  consumed_samples: 320  total_loss: 8.019  time: 0.1357 s/iter  data_time: 0.0008 s/iter total_throughput: 29.48 samples/s lr: 1.50e-04
[09/10 10:51:05 lb.utils.events]:  eta: 0:10:09  iteration: 99/10000  consumed_samples: 400  total_loss: 7.635  time: 0.1232 s/iter  data_time: 0.0008 s/iter total_throughput: 32.47 samples/s lr: 1.50e-04
[09/10 10:51:06 lb.utils.events]:  eta: 0:10:09  iteration: 119/10000  consumed_samples: 480  total_loss: 7.461  time: 0.1132 s/iter  data_time: 0.0008 s/iter total_throughput: 35.34 samples/s lr: 1.50e-04
[09/10 10:51:08 lb.utils.events]:  eta: 0:10:09  iteration: 139/10000  consumed_samples: 560  total_loss: 7.367  time: 0.1061 s/iter  data_time: 0.0009 s/iter total_throughput: 37.72 samples/s lr: 1.50e-04
[09/10 10:51:09 lb.utils.events]:  eta: 0:10:06  iteration: 159/10000  consumed_samples: 640  total_loss: 7.305  time: 0.1003 s/iter  data_time: 0.0008 s/iter total_throughput: 39.88 samples/s lr: 1.50e-04
[09/10 10:51:10 lb.utils.events]:  eta: 0:10:04  iteration: 179/10000  consumed_samples: 720  total_loss: 7.214  time: 0.0975 s/iter  data_time: 0.0008 s/iter total_throughput: 41.02 samples/s lr: 1.50e-04
[09/10 10:51:12 lb.utils.events]:  eta: 0:10:03  iteration: 199/10000  consumed_samples: 800  total_loss: 7.132  time: 0.0940 s/iter  data_time: 0.0007 s/iter total_throughput: 42.55 samples/s lr: 1.50e-04
[09/10 10:51:13 lb.utils.events]:  eta: 0:10:02  iteration: 219/10000  consumed_samples: 880  total_loss: 6.986  time: 0.0911 s/iter  data_time: 0.0008 s/iter total_throughput: 43.93 samples/s lr: 1.50e-04
[09/10 10:51:14 lb.utils.events]:  eta: 0:10:01  iteration: 239/10000  consumed_samples: 960  total_loss: 6.866  time: 0.0886 s/iter  data_time: 0.0009 s/iter total_throughput: 45.15 samples/s lr: 1.50e-04
[09/10 10:51:18 lb.utils.events]:  eta: 0:10:00  iteration: 259/10000  consumed_samples: 1040  total_loss: 6.764  time: 0.0958 s/iter  data_time: 0.0008 s/iter total_throughput: 41.74 samples/s lr: 1.50e-04
[09/10 10:51:19 lb.utils.events]:  eta: 0:09:58  iteration: 279/10000  consumed_samples: 1120  total_loss: 6.655  time: 0.0933 s/iter  data_time: 0.0008 s/iter total_throughput: 42.85 samples/s lr: 1.50e-04

ShawnXuan added 10 commits September 9, 2024 06:39

fix bfloat16

d5bdefe

update tokenizer encode

526f280

support python 3.10-

6ee4729

support python 3.10-

f1960ca

support python 3.10-

bef17f2

click

c55ac20

fix tokenizer path

9c60405

update

b73104f

update

12843f1

checkout main files

c1d315e

ShawnXuan requested review from 0x404, Flowingsun007, fpzh2011, oneflow-ci-bot and xiezipeng-ML September 10, 2024 10:27

Merge branch 'main' of github.com:Oneflow-Inc/libai into gpt_devices

34052ed

xiezipeng-ML approved these changes Sep 11, 2024

View reviewed changes

Flowingsun007 approved these changes Sep 12, 2024

View reviewed changes

fpzh2011 approved these changes Sep 12, 2024

View reviewed changes

0x404 approved these changes Sep 12, 2024

View reviewed changes

ShawnXuan merged commit 13056f4 into main Sep 13, 2024

ShawnXuan deleted the gpt_devices branch September 13, 2024 02:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Support for GPT-2 Training on different Devices #551

Add Support for GPT-2 Training on different Devices #551

Uh oh!

ShawnXuan commented Sep 10, 2024 •

edited

Loading

Uh oh!

ShawnXuan commented Sep 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Add Support for GPT-2 Training on different Devices #551

Add Support for GPT-2 Training on different Devices #551

Uh oh!

Conversation

ShawnXuan commented Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Getting Started

Prepare the Data and Vocabulary

How to Train gpt2 Model with NPU/XPU

Uh oh!

ShawnXuan commented Sep 10, 2024

NPU(910B3)

CUDA(A100)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ShawnXuan commented Sep 10, 2024 •

edited

Loading