Skip to content

Commit c907607

Browse files
merge with main
2 parents ac19891 + b669015 commit c907607

File tree

241 files changed

+6536
-1557
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

241 files changed

+6536
-1557
lines changed

CONTRIBUTING.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# Contributing to verl
2+
3+
Thank you for considering a contribution to verl! We welcome contributions of any kind - bug fixes, enhancements, documentation improvements, or even just feedback. Whether you're an experienced developer or this is your first open-source project, your help is invaluable.
4+
5+
Your support can take many forms:
6+
- Report issues or unexpected behaviors.
7+
- Suggest or implement new features.
8+
- Improve or expand documentation.
9+
- Review pull requests and assist other contributors.
10+
- Spread the word: share verl in blog posts, social media, or give the repo a ⭐.
11+
12+
## Finding Issues to Contribute
13+
14+
Looking for ways to dive in? Check out these issues:
15+
- [Good first issues](https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22)
16+
- [Call for contribution](https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22call%20for%20contribution%22)
17+
Furthermore, you can learn the development plan and roadmap via [RFC](https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20label%3ARFC) and [Roadmap](https://github.com/volcengine/verl/issues?q=state%3Aopen%20label%3A%22roadmap%22).
18+
19+
20+
## Developing
21+
22+
- **Python-only**: install verl via `pip install -e .[test,vllm]` or `pip install -e .[test,sglang]` and iterate quickly. For full dependency setup, check out the verl [installation doc](https://verl.readthedocs.io/en/latest/start/install.html).
23+
24+
## Code Linting and Formatting
25+
26+
We rely on pre-commit to keep our code consistent. To set it up:
27+
28+
```bash
29+
pip install pre-commit
30+
pre-commit install
31+
# for staged changes
32+
pre-commit run
33+
# for all files in the repo
34+
# pre-commit run --all-files
35+
```
36+
37+
## Testing
38+
39+
Our test suites run on GitHub Actions. Check these workflows for details:
40+
- [GPU unit tests](https://github.com/volcengine/verl/blob/main/.github/workflows/gpu_unit_tests.yml)
41+
- [CPU unit tests](https://github.com/volcengine/verl/blob/main/.github/workflows/cpu_unit_tests.yml)
42+
- [vLLM tests](https://github.com/volcengine/verl/blob/main/.github/workflows/vllm.yml)
43+
- [SGLang tests](https://github.com/volcengine/verl/blob/main/.github/workflows/sgl.yml)
44+
45+
### Adding CI tests
46+
47+
If possible, please add CI test(s) for your new feature:
48+
49+
1. Find the most relevant workflow yml file, which usually corresponds to a `hydra` default config (e.g. `ppo_trainer`, `ppo_megatron_trainer`, `sft_trainer`, etc).
50+
2. Add related path patterns to the `paths` section if not already included.
51+
3. Minimize the workload of the test script(s) (see existing scripts for examples).
52+
53+
## Building the Docs
54+
```
55+
# Ensure verl is on your PYTHONPATH, e.g.:
56+
pip install -e .[test]
57+
58+
# Install documentation dependencies
59+
pip install -r requirements-docs.txt
60+
61+
# Generate HTML docs
62+
make clean
63+
make html
64+
65+
# Preview locally
66+
python -m http.server -d _build/html/
67+
```
68+
Open your browser at http://localhost:8000 to explore the docs.
69+
70+
## Pull Requests & Code Reviews
71+
72+
Thanks for submitting a PR! To streamline reviews:
73+
- Follow our Pull Request Template for title format and checklist.
74+
- Adhere to our pre-commit lint rules and ensure all checks pass.
75+
- Update docs for any user-facing changes.
76+
- Add or update tests in the CI workflows, or explain why tests aren't applicable.
77+
78+
## License
79+
80+
See the [LICENSE](https://github.com/volcengine/verl/blob/main/LICENSE) file for full details.
81+
82+
## Thank You
83+
84+
We appreciate your contributions to verl. Your efforts help make the project stronger and more user-friendly. Happy coding!
85+

README.md

Lines changed: 2 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -228,32 +228,10 @@ verl is inspired by the design of Nemo-Aligner, Deepspeed-chat and OpenRLHF. The
228228
- [RACRO](https://github.com/gyhdog99/RACRO2): Build multi-modal reasoning models via decoupling it into query-conditioned captioning and text-only reasoning ![GitHub Repo stars](https://img.shields.io/github/stars/gyhdog99/RACRO2)
229229

230230
and many more awesome work listed in [recipe](recipe/README.md).
231-
## Contribution Guide
232-
233-
Contributions from the community are welcome! Please check out our [project roadmap](https://github.com/volcengine/verl/issues/710) and [good first issues](https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22) to see where you can contribute.
234-
235-
### Code Linting and Formatting
236-
237-
We use pre-commit to help improve code quality. To initialize pre-commit, run:
238-
239-
```bash
240-
pip install pre-commit
241-
pre-commit install
242-
```
243231

244-
To resolve CI errors locally, you can manually run pre-commit by:
245-
246-
```bash
247-
pre-commit run
248-
```
249-
250-
### Adding CI tests
251-
252-
If possible, please add CI test(s) for your new feature:
232+
## Contribution Guide
253233

254-
1. Find the most relevant workflow yml file, which usually corresponds to a `hydra` default config (e.g. `ppo_trainer`, `ppo_megatron_trainer`, `sft_trainer`, etc).
255-
2. Add related path patterns to the `paths` section if not already included.
256-
3. Minimize the workload of the test script(s) (see existing scripts for examples).
234+
See [contributions guide](CONTRIBUTING.md)
257235

258236
## About [ByteDance Seed Team](https://team.doubao.com/)
259237

docs/advance/checkpoint.rst

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _checkpoint-page:
2+
13
Using Checkpoints to Support Fault Tolerance Training
24
=====================================================
35

@@ -28,12 +30,14 @@ So the inner checkpoint structure of **FSDP** is like:
2830
checkpoints/${trainer.project_name}/${trainer.experiment_name}
2931
├── global_steps_${i}
3032
│ ├── actor
31-
│ │ ├── huggingface # default save config and tokenizer, save huggingface model if include ``hf_model`` in checkpoint.contents
33+
│ │ ├── huggingface # default save config and tokenizer, save huggingface model if include ``hf_model`` in checkpoint.contents
34+
│ │ └── fsdp_config.json # FSDP config file, including world_size and fsdp version
3235
│ │ ├── model_world_size_{self.world_size}_rank_{self.rank}.pt
3336
│ │ ├── optim_world_size_{self.world_size}_rank_{self.rank}.pt
3437
│ │ └── extra_state_world_size_{self.world_size}_rank_{self.rank}.pt
3538
│ ├── critic
3639
│ │ ├── huggingface
40+
│ │ └── fsdp_config.json
3741
│ │ ├── model_world_size_{self.world_size}_rank_{self.rank}.pt
3842
│ │ ├── optim_world_size_{self.world_size}_rank_{self.rank}.pt
3943
│ │ └── extra_state_world_size_{self.world_size}_rank_{self.rank}.pt
@@ -59,27 +63,26 @@ Convert FSDP and Megatron Checkpoints to HuggingFace Format Model
5963
-----------------------------------------------------------------
6064

6165
We provide a tool to convert the FSDP and Megatron checkpoints to HuggingFace format model.
62-
The tool is located in ``verl/model_merger``.
66+
The tool is located in ``verl/model_merger``. For older versions of verl that don't include fsdp_config.json in checkpoints, you can use the legacy model merger located at ``verl/scripts/legacy_model_merger.py``.
6367

6468
The script supports two main sub-commands: `merge` (to convert and save checkpoints) and `test` (to validate merged checkpoints against a reference model).
6569
The arguments for the `merge` sub-command are as follows:
6670

6771
.. code:: bash
6872
69-
usage: python -m verl.model_merger merge [-h] --backend {fsdp,megatron} --local_dir LOCAL_DIR [--hf_model_path HF_MODEL_PATH]
70-
[--tie-word-embedding] [--is-value-model] [--target_dir TARGET_DIR]
71-
[--hf_upload_path HF_UPLOAD_PATH] [--private]
73+
usage: python -m verl.model_merger merge [-h] --backend {fsdp,megatron} [--local_dir LOCAL_DIR] [--tie-word-embedding] [--is-value-model] [--use_cpu_initialization] [--target_dir TARGET_DIR]
74+
[--hf_upload_path HF_UPLOAD_PATH] [--private]
7275
7376
options:
7477
-h, --help show this help message and exit
7578
--backend {fsdp,megatron}
7679
The backend of the model
7780
--local_dir LOCAL_DIR
7881
Path to the saved model checkpoints
79-
--hf_model_path HF_MODEL_PATH
80-
(Deprecated) Path to the original Hugging Face model for config.
8182
--tie-word-embedding Whether to tie word embedding weights (currently only Megatron supported)
8283
--is-value-model Whether the model is a value model (currently only Megatron supported)
84+
--use_cpu_initialization
85+
Whether to use CPU initialization for the model. This is useful for large models that cannot fit into GPU memory during initialization.
8386
--target_dir TARGET_DIR
8487
Directory to save the merged huggingface model
8588
--hf_upload_path HF_UPLOAD_PATH

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ verl is fast with:
3232
start/quickstart
3333
start/multinode
3434
start/ray_debug_tutorial
35+
start/more_resources
3536

3637
.. toctree::
3738
:maxdepth: 2

docs/start/install.rst

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,15 +21,12 @@ We recommend using **FSDP** backend to investigate, research and prototype diffe
2121

2222
For users who pursue better scalability, we recommend using **Megatron-LM** backend. Currently, we support `Megatron-LM v0.11 <https://github.com/NVIDIA/Megatron-LM/tree/v0.11.0>`_. The guide for using Megatron-LM backend can be found in :doc:`Megatron-LM Workers<../workers/megatron_workers>`.
2323

24-
.. note::
25-
26-
verl directly supports megatron's `GPTModel` API on the main branch with mcore v0.11. For mcore v0.4 try `0.3.x branch <https://github.com/volcengine/verl/tree/v0.3.x>`_ instead.
2724

2825
2. Inference:
2926

3027
For inference, vllm 0.8.3 and later versions have been tested for stability. We recommend turning on env var `VLLM_USE_V1=1` for optimal performance.
3128

32-
For SGLang, refer to the :doc:`SGLang Backend<../workers/sglang_worker>` for detailed installation and usage instructions. **SGLang offers better throughput and is under extensive development.** We encourage users to report any issues or provide feedback via the `SGLang Issue Tracker <https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/106>`_.
29+
For SGLang, refer to the :doc:`SGLang Backend<../workers/sglang_worker>` for detailed installation and usage instructions. SGLang rollout is under extensive development and offers many advanced features and optimizations. We encourage users to report any issues or provide feedback via the `SGLang Issue Tracker <https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/106>`_.
3330

3431
For huggingface TGI integration, it is usually used for debugging and single GPU exploration.
3532

docs/start/more_resources.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
More Resources
2+
==============
3+
4+
Last updated: 06/30/2025.
5+
6+
- Introduction to verl (`Slides <https://tongyx361.github.io/blogs/posts/verl-intro>`_)
7+
- verl Code Walkthrough (`Slides <https://tongyx361.github.io/blogs/posts/verl-tutorial>`_, `Talk in Chinese <https://hcqnc.xetlk.com/sl/3vACOK>`_)

docs/start/quickstart.rst

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -70,15 +70,15 @@ answer from both the solution and model's output using regular
7070
expression matching. We assign a reward of 1 to correct
7171
answer, 0.0 to incorrect answer and 0 to no answer.
7272

73-
For more details, please refer to `verl/utils/reward_score/gsm8k.py <https://github.com/volcengine/verl/blob/v0.1/verl/utils/reward_score/gsm8k.py>`_.
73+
For more details, please refer to `verl/utils/reward_score/gsm8k.py <https://github.com/volcengine/verl/blob/v0.4.1/verl/utils/reward_score/gsm8k.py>`_.
7474

7575
**Training Script**
7676

7777
Now let's run PPO training with the dataset and model above. [2]_
7878

7979

8080
Set the ``data.train_files`` ,\ ``data.val_files``, ``actor_rollout_ref.model.path`` and ``critic.model.path`` based on your dataset and model names or paths.
81-
You may set ``VERL_USE_MODELSCOPE=True`` to download models from modelscope instead of huggingface.
81+
You may set ``VERL_USE_MODELSCOPE=True`` to download models from `modelscope <https://www.modelscope.cn>`_ instead of `huggingface <https://huggingface.co>`_.
8282

8383
.. code-block:: bash
8484
@@ -118,7 +118,16 @@ You are expected to see the following logs, indicating training in progress. The
118118
119119
Checkout :ref:`algo-baseline-page` for full training and validation logs for reference.
120120

121-
The checkpoint is saved at the following dir by default: ``checkpoints/${trainer.project_name}/${trainer.experiment_name}``
121+
The checkpoint is saved at the following dir by default: ``checkpoints/${trainer.project_name}/${trainer.experiment_name}``. You can merge the saved checkpoints to huggingface model using ``verl.model_merger`` module, for example:
122+
123+
.. code-block:: bash
124+
125+
python3 -m verl.model_merger merge \
126+
--backend fsdp \
127+
--local_dir checkpoints/${trainer.project_name}/${trainer.experiment_name}/global_step_1/actor \
128+
--target_dir checkpoints/${trainer.project_name}/${trainer.experiment_name}/global_step_1/actor/huggingface
129+
130+
For more details about checkpoint and model merging, please refer to :ref:`checkpoint-page`.
122131

123132
To enable ``wandb`` for experiment tracking, set the following configs:
124133

examples/data_preprocess/geo3k.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,8 @@
3838

3939
instruction_following = (
4040
r"You FIRST think about the reasoning process as an internal monologue and then provide the final answer. "
41-
r"The reasoning process MUST BE enclosed within <think> </think> tags. The final answer MUST BE put in \boxed{}."
41+
r"The reasoning process MUST BE enclosed within <think> </think> tags. "
42+
r"The final answer MUST BE put in \boxed{}."
4243
)
4344

4445
# add a row to each data item that represents a unique id

examples/data_preprocess/geo3k_multiturn_w_tool.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,8 @@
3737
test_dataset = dataset["test"]
3838
instruction_following = (
3939
r"You FIRST think about the reasoning process as an internal monologue and then provide the final answer. "
40-
r"The reasoning process MUST BE enclosed within <think> </think> tags. The final answer MUST BE put in \boxed{}."
40+
r"The reasoning process MUST BE enclosed within <think> </think> tags. "
41+
r"The final answer MUST BE put in \boxed{}."
4142
)
4243

4344
# add a row to each data item that represents a unique id

examples/data_preprocess/gsm8k_multiturn_w_interaction.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,11 @@ def process_fn(example, idx):
6363
"prompt": [
6464
{
6565
"role": "system",
66-
"content": ("You are a math expert. You are given a question and you need to solve it step by step. You should rethinking carefully if user point out your answer is wrong. Put your final answer in the format of `#### <answer>`."),
66+
"content": (
67+
"You are a math expert. You are given a question and you need to solve it step by step. "
68+
"You should rethinking carefully if user point out your answer is wrong. "
69+
"Put your final answer in the format of `#### <answer>`."
70+
),
6771
},
6872
{
6973
"role": "user",

0 commit comments

Comments
 (0)