Skip to content

Conversation

@ShawnXuan
Copy link
Contributor

@ShawnXuan ShawnXuan commented Sep 10, 2024

Getting Started

Prepare the Data and Vocabulary

  • We have provided the relevant datasets, which can be downloaded from the following links:
  1. Vocabulary JSON
  2. Merges File
  3. Binary Data File
  4. Index Data File
  • Download the dataset and organize the files into a directory. The folder structure should look like this:
$ tree data
path/to/gpt_data
├── gpt2-vocab.json
├── gpt2-merges.txt
├── loss_compara_content_sentence.bin
└── loss_compara_content_sentence.idx
  • Ensure the correct location of gpt_data:

    • Option 1: Create a symbolic link

      # Assuming the working directory is the root of the libai project
      mkdir data_test
      ln -s /path/to/gpt_data data_test/gpt_data
      # ln -s /data0/datasets/gpt2 data_test/gpt_data # on our 910B
    • Option 2: Modify the configuration file configs/gpt2_pretrain.py

      Adjust the following configuration based on your specific environment:

      vocab_file = "./data_test/gpt_data/gpt2-vocab.json"
      merge_files = "./data_test/gpt_data/gpt2-merges.txt"
      data_prefix = "./data_test/gpt_data/loss_compara_content_sentence"

How to Train gpt2 Model with NPU/XPU

python3 -m oneflow.distributed.launch \
    --nproc_per_node 1 \
    --nnodes 1 \
    --node_rank 0 \
    --master_addr 127.0.0.1 \
    --master_port 12345 \
        tools/train_net.py --config-file=configs/gpt2_pretrain.py \
            graph.enabled=False \
            train.input_placement_device="npu" \
            train.dist.device_type="npu" \
            train.amp.enabled=False \
            model.cfg.scale_mask_softmax_fusion=False \
            model.cfg.bias_gelu_fusion=False

If you want to train on XPU, please change 'npu' to 'xpu'.

@ShawnXuan
Copy link
Contributor Author

NPU(910B3)

[09/10 10:37:57 libai]: >>> done with building model. Building time: 0.282 seconds
WARNING [09/10 10:37:57 lb.scheduler.lr_scheduler]: warmup iters equals to zero, return CosineLR
[09/10 10:38:03 lb.engine.trainer]: Starting training from iteration 0
[09/10 10:40:56 lb.utils.events]:  eta: 21:00:38  iteration: 19/10000  consumed_samples: 80  total_loss: 9.895  time: 7.5187 s/iter  data_time: 0.0021 s/iter total_throughput: 0.53 samples/s lr: 1.50e-04
[09/10 10:43:32 lb.utils.events]:  eta: 21:05:47  iteration: 39/10000  consumed_samples: 160  total_loss: 9.027  time: 7.6572 s/iter  data_time: 0.0019 s/iter total_throughput: 0.52 samples/s lr: 1.50e-04
[09/10 10:46:05 lb.utils.events]:  eta: 21:06:05  iteration: 59/10000  consumed_samples: 240  total_loss: 8.362  time: 7.6549 s/iter  data_time: 0.0015 s/iter total_throughput: 0.52 samples/s lr: 1.50e-04
[09/10 10:48:42 lb.utils.events]:  eta: 21:08:55  iteration: 79/10000  consumed_samples: 320  total_loss: 7.847  time: 7.7127 s/iter  data_time: 0.0013 s/iter total_throughput: 0.52 samples/s lr: 1.50e-04
[09/10 10:51:22 lb.utils.events]:  eta: 21:18:52  iteration: 99/10000  consumed_samples: 400  total_loss: 7.628  time: 7.7640 s/iter  data_time: 0.0013 s/iter total_throughput: 0.52 samples/s lr: 1.50e-04
[09/10 10:53:53 lb.utils.events]:  eta: 21:04:10  iteration: 119/10000  consumed_samples: 480  total_loss: 7.441  time: 7.7314 s/iter  data_time: 0.0013 s/iter total_throughput: 0.52 samples/s lr: 1.50e-04

CUDA(A100)

[09/10 10:50:47 libai]: >>> done with building model. Building time: 5.722 seconds
WARNING [09/10 10:50:47 lb.scheduler.lr_scheduler]: warmup iters equals to zero, return CosineLR
[09/10 10:50:50 lb.engine.trainer]: Starting training from iteration 0
[09/10 10:50:54 lb.utils.events]:  eta: 0:10:15  iteration: 19/10000  consumed_samples: 80  total_loss: 9.83  time: 0.0689 s/iter  data_time: 0.0008 s/iter total_throughput: 58.05 samples/s lr: 1.50e-04
[09/10 10:50:58 lb.utils.events]:  eta: 0:10:15  iteration: 39/10000  consumed_samples: 160  total_loss: 9.122  time: 0.1458 s/iter  data_time: 0.0007 s/iter total_throughput: 27.43 samples/s lr: 1.50e-04
[09/10 10:51:00 lb.utils.events]:  eta: 0:10:12  iteration: 59/10000  consumed_samples: 240  total_loss: 8.388  time: 0.1214 s/iter  data_time: 0.0007 s/iter total_throughput: 32.94 samples/s lr: 1.50e-04
[09/10 10:51:03 lb.utils.events]:  eta: 0:10:11  iteration: 79/10000  consumed_samples: 320  total_loss: 8.019  time: 0.1357 s/iter  data_time: 0.0008 s/iter total_throughput: 29.48 samples/s lr: 1.50e-04
[09/10 10:51:05 lb.utils.events]:  eta: 0:10:09  iteration: 99/10000  consumed_samples: 400  total_loss: 7.635  time: 0.1232 s/iter  data_time: 0.0008 s/iter total_throughput: 32.47 samples/s lr: 1.50e-04
[09/10 10:51:06 lb.utils.events]:  eta: 0:10:09  iteration: 119/10000  consumed_samples: 480  total_loss: 7.461  time: 0.1132 s/iter  data_time: 0.0008 s/iter total_throughput: 35.34 samples/s lr: 1.50e-04
[09/10 10:51:08 lb.utils.events]:  eta: 0:10:09  iteration: 139/10000  consumed_samples: 560  total_loss: 7.367  time: 0.1061 s/iter  data_time: 0.0009 s/iter total_throughput: 37.72 samples/s lr: 1.50e-04
[09/10 10:51:09 lb.utils.events]:  eta: 0:10:06  iteration: 159/10000  consumed_samples: 640  total_loss: 7.305  time: 0.1003 s/iter  data_time: 0.0008 s/iter total_throughput: 39.88 samples/s lr: 1.50e-04
[09/10 10:51:10 lb.utils.events]:  eta: 0:10:04  iteration: 179/10000  consumed_samples: 720  total_loss: 7.214  time: 0.0975 s/iter  data_time: 0.0008 s/iter total_throughput: 41.02 samples/s lr: 1.50e-04
[09/10 10:51:12 lb.utils.events]:  eta: 0:10:03  iteration: 199/10000  consumed_samples: 800  total_loss: 7.132  time: 0.0940 s/iter  data_time: 0.0007 s/iter total_throughput: 42.55 samples/s lr: 1.50e-04
[09/10 10:51:13 lb.utils.events]:  eta: 0:10:02  iteration: 219/10000  consumed_samples: 880  total_loss: 6.986  time: 0.0911 s/iter  data_time: 0.0008 s/iter total_throughput: 43.93 samples/s lr: 1.50e-04
[09/10 10:51:14 lb.utils.events]:  eta: 0:10:01  iteration: 239/10000  consumed_samples: 960  total_loss: 6.866  time: 0.0886 s/iter  data_time: 0.0009 s/iter total_throughput: 45.15 samples/s lr: 1.50e-04
[09/10 10:51:18 lb.utils.events]:  eta: 0:10:00  iteration: 259/10000  consumed_samples: 1040  total_loss: 6.764  time: 0.0958 s/iter  data_time: 0.0008 s/iter total_throughput: 41.74 samples/s lr: 1.50e-04
[09/10 10:51:19 lb.utils.events]:  eta: 0:09:58  iteration: 279/10000  consumed_samples: 1120  total_loss: 6.655  time: 0.0933 s/iter  data_time: 0.0008 s/iter total_throughput: 42.85 samples/s lr: 1.50e-04

@ShawnXuan ShawnXuan merged commit 13056f4 into main Sep 13, 2024
@ShawnXuan ShawnXuan deleted the gpt_devices branch September 13, 2024 02:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants