[ci] feat: add profiling tests to vLLM ci#5215
[ci] feat: add profiling tests to vLLM ci#5215Gary-cjy wants to merge 4 commits intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds a shell script for a profiling test. I've found two critical issues in the script that will cause it to fail. One is the use of a placeholder path for saving results, which is not suitable for a CI environment. The other is a shell syntax error in a variable assignment. Both issues need to be fixed for the script to run correctly.
| MODEL_ID=${MODEL_ID:-Qwen/Qwen2.5-0.5B-Instruct} | ||
| MODEL_PATH=${MODEL_PATH:-${HOME}/.cache/models/${MODEL_ID}} | ||
|
|
||
| SAVE_PATH="your_path" |
There was a problem hiding this comment.
The SAVE_PATH is set to a placeholder "your_path". This is not a valid directory and will cause the script to fail, especially in a CI environment where such a path is unlikely to exist. Please use a valid, descriptive path for the output data.
| SAVE_PATH="your_path" | |
| SAVE_PATH="outputs/profile_qwen2_5_05b_grpo" |
|
|
||
| SAVE_PATH="your_path" | ||
| LEVEL="level1" | ||
| CONTENTS=['npu','cpu'] |
There was a problem hiding this comment.
The assignment CONTENTS=['npu','cpu'] is invalid shell syntax. The shell will attempt to execute ['npu','cpu'] as a command and assign its output to the CONTENTS variable, which will fail. To assign the string representation of a list to the variable, the value must be enclosed in quotes.
| CONTENTS=['npu','cpu'] | |
| CONTENTS="['npu','cpu']" |
| global_profiler.tool=npu \ | ||
| global_profiler.steps=$PROFILE_STEPS \ | ||
| global_profiler.save_path=$SAVE_PATH \ | ||
| trainer.device=npu $@ |
There was a problem hiding this comment.
trainer.device is not required, device type can be obtained automatically
What does this PR do?
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.