Yuan2.0 Pretraining

Introduction

This document provides instructions for pretraining model of Yuan2.0.

Three models are provided, the main parameters are as follows:

	Layer number	Hidden size	Attention head
2B	24	2048	32
51B	42	8192	64
102B	84	8192	64

Usage

The scripts describe three models in Yuan2.0:

Example

An example script to run Yuan-2.1B pretraining is:

bash examples/pretrain_yuan2.0_2.1B.sh

Arguments setting

Before running the script, the relevant arguments should be set correctly.

Firstly, make any desired modifications including setting the environment variables for CHECKPOINT_PATH, DATA_PATH, TOKENIZER_MODEL_PATH and TENSORBOARD_PATH.

If the dataset path is:

/path/dataset.bin

The DATA_PATH can be set :

#DATA_PATH='weight dataset_path'
DATA_PATH='1 /path/dataset'

The dataset preprocess can see documentation here.

A simple and efficient three-dimensional model-parallel approach can be controlled by --tensor-model-parallel-size and --pipeline-model-parallel-size flag. If the --pipeline-model-parallel-method flag is set to block, the number of transformer layers shoule be specified by the --pipeline-model-parallel-blocks for each pipeline stage.

The Localized Filtering-based Attention(LFA) can be activated by the '--use-lf-gate flag. And the --lf-conv2d-num-pad flag shoule be set to 1 for training and 0 for inference.

The --use-distributed-optimizer and --recompute-method can control the use of memory during Training.

Further command line arguments are described in the source file arguments.py and Megatron-LM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yuan2.0 Pretraining

Introduction

Usage

Example

Arguments setting

FilesExpand file tree

pretrain.md

Latest commit

History

pretrain.md

File metadata and controls

Yuan2.0 Pretraining

Introduction

Usage

Example

Arguments setting