[recipe] fix: Update the grpo training script for gpt-oss models by HJSang · Pull Request #3836 · verl-project/verl

HJSang · 2025-10-20T20:20:03Z

What does this PR do?

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

remove customized python packages since they are already supported
Add reasoning_effort input
recommend a setup for batch size to avoid MOE instability.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

Test offline: run a training job

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

gemini-code-assist

Code Review

This pull request updates the GRPO training script for gpt-oss models by removing obsolete package installation steps and tuning several training parameters. The changes include adjusting batch sizes for MoE stability, increasing the maximum response length, and adding a reasoning_effort parameter.

While the changes are generally positive, I have a critical concern regarding performance. The get_model.py script within run_gptoss_20b.sh configures the model to use attn_implementation=\"eager\". This setting will persist in the saved model's config.json and will be used during the main FSDP training loop. For a large model like gpt-oss-20b, using the unoptimized eager attention mechanism will be significantly slower and more memory-intensive than optimized alternatives like Flash Attention, severely impacting training efficiency.

While eager might be required for the Mxfp4Config quantization with device_map=\"auto\", it's detrimental for training. I recommend investigating if a more performant attention implementation like flash_attention_2 can be used. If eager is strictly necessary for the initial model download and quantization, consider patching the config.json file after the get_model.py script runs to switch to a faster implementation for the training stage. For example, you could add the following line to the shell script:

sed -i 's/\"attn_implementation\": \"eager\"/\"attn_implementation\": \"flash_attention_2\"/' \"${HOME}/models/gpt-oss-20b-bf16/config.json\"

This would ensure the training part of the script benefits from optimized attention, while the initial model preparation remains unchanged.

…l-project#3836) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. * remove customized python packages since they are already supported * Add reasoning_effort input * recommend a setup for batch size to avoid MOE instability. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test Test offline: run a training job > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Co-authored-by: Hejian Sang <hsang@linkedin.com>

update the script for gpt-oss grpo training

1219538

HJSang requested review from FightingZhen, PeterSH6, ji-huazhong and vermouth1992 as code owners October 20, 2025 20:20

gemini-code-assist bot reviewed Oct 20, 2025

View reviewed changes

HJSang mentioned this pull request Oct 21, 2025

Agentic RL Support in GPT-OSS #3794

Open

wuxibin89 approved these changes Oct 21, 2025

View reviewed changes

wuxibin89 merged commit 5f87fbc into verl-project:main Oct 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[recipe] fix: Update the grpo training script for gpt-oss models#3836

[recipe] fix: Update the grpo training script for gpt-oss models#3836
wuxibin89 merged 1 commit intoverl-project:mainfrom
HJSang:hejian/tpt_oss_gsm8k

HJSang commented Oct 20, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HJSang commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HJSang commented Oct 20, 2025 •

edited

Loading