-
Notifications
You must be signed in to change notification settings - Fork 370
Description
在将black-box distillation(GAD, https://arxiv.org/abs/2511.10643)应用于Qwen3-VL训练时,速度很慢,想问问大佬是为什么?actor和critic都是Qwen3-VL-2B-Instruct
TRAIN_BATCH_SIZE=256
VAL_BATCH_SIZE=100
MAX_PROMPT_LENGTH=4096
MAX_RESPONSE_LENGTH=3072
PPO_MINI_BATCH_SIZE=128
ROLLOUT_N=4
echo "data.train_batch_size=$TRAIN_BATCH_SIZE"
echo "actor_rollout_ref.actor.ppo_mini_batch_size=$PPO_MINI_BATCH_SIZE"
echo "actor_rollout_ref.rollout.n=$ROLLOUT_N"
学习率设置
ACTOR_LR=1e-6
CRITIC_LR=1e-6
============= 推理引擎配置 =============
GEN_TP=1
SAVE_HF_MODEL=${SAVE_HF_MODEL:-True}
if [ "${SAVE_HF_MODEL}" = "True" ]; then
CHECKPOINT_CONTENTS="['model','hf_model','optimizer','extra']"
else
CHECKPOINT_CONTENTS="['model','optimizer','extra']"
fi
============= 运行训练 =============
python3 -m verl.trainer.main_ppo
algorithm.adv_estimator=grpo
data.train_files=$TRAIN_FILES
data.train_batch_size=$TRAIN_BATCH_SIZE
data.val_files=$VAL_FILES
data.val_batch_size=$VAL_BATCH_SIZE
data.max_prompt_length=$MAX_PROMPT_LENGTH
data.max_response_length=$MAX_RESPONSE_LENGTH
data.filter_overlong_prompts=True
data.truncation=right
actor_rollout_ref.model.path=$MODEL_PATH
actor_rollout_ref.model.use_remove_padding=True
actor_rollout_ref.model.enable_gradient_checkpointing=True
actor_rollout_ref.actor.fsdp_config.model_dtype=bf16
critic.model.fsdp_config.model_dtype=bf16
actor_rollout_ref.actor.optim.lr=$ACTOR_LR
actor_rollout_ref.actor.grad_clip=0.2
actor_rollout_ref.actor.ppo_mini_batch_size=$PPO_MINI_BATCH_SIZE
actor_rollout_ref.actor.use_dynamic_bsz=True
actor_rollout_ref.actor.ppo_max_token_len_per_gpu=24576
actor_rollout_ref.actor.use_kl_loss=True
actor_rollout_ref.actor.entropy_coeff=0.0
actor_rollout_ref.actor.kl_loss_coef=0.001
actor_rollout_ref.actor.kl_loss_type=low_var_kl
actor_rollout_ref.actor.ulysses_sequence_parallel_size=1
actor_rollout_ref.actor.fsdp_config.param_offload=False
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False
actor_rollout_ref.actor.checkpoint.save_contents=${CHECKPOINT_CONTENTS}
actor_rollout_ref.ref.fsdp_config.param_offload=False
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4
critic.ppo_micro_batch_size_per_gpu=4
actor_rollout_ref.rollout.name=vllm
actor_rollout_ref.rollout.temperature=0.8
actor_rollout_ref.rollout.gpu_memory_utilization=0.4
actor_rollout_ref.rollout.top_p=0.9
actor_rollout_ref.rollout.n=$ROLLOUT_N
+actor_rollout_ref.rollout.engine_kwargs.vllm.disable_mm_preprocessor_cache=True
actor_rollout_ref.rollout.enforce_eager=False
actor_rollout_ref.rollout.free_cache_engine=False
actor_rollout_ref.rollout.prompt_length=4096
+actor_rollout_ref.rollout.repetition_penalty=1.05
actor_rollout_ref.rollout.response_length=$MAX_RESPONSE_LENGTH
critic.model.path=$REWARD_MODEL_PATH
reward_model.use_reward_loop=False
critic.model.use_remove_padding=True
critic.model.enable_gradient_checkpointing=True
critic.use_dynamic_bsz=True
critic.optim.lr=$CRITIC_LR
critic.ppo_max_token_len_per_gpu=24576
critic.grad_clip=0.2
critic.enable=True
critic.model.fsdp_config.optimizer_offload=False
critic.checkpoint.save_contents=${CHECKPOINT_CONTENTS}
algorithm.kl_ctrl.kl_coef=0.001
trainer.val_before_train=False
trainer.critic_warmup=0.01
trainer.logger='["console"]'
trainer.n_gpus_per_node=$GPUs
trainer.nnodes=1
trainer.save_freq=50
trainer.test_freq=-1
trainer.default_hdfs_dir=null
trainer.total_epochs=1
trainer.default_local_dir=$SAVE_DIR "$@"
我的设备为8*H200,数据集大小为40k,但是显示训练一个epoch需要30h。
观察GPU显存峰值基本在120G左右,但大部分时间只有60G在使用。调了一些参数但没什么用,训练时间基本都在28-30h。