verl-project
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 85 additions & 0 deletions b/‎CONTRIBUTING.md‎
Lines changed: 85 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 2 additions & 24 deletions b/‎README.md‎
Lines changed: 2 additions & 24 deletions
diff --git a/‎docs/advance/checkpoint.rst‎
Lines changed: 10 additions & 7 deletions b/‎docs/advance/checkpoint.rst‎
Lines changed: 10 additions & 7 deletions
diff --git a/‎docs/index.rst‎
Lines changed: 1 addition & 0 deletions b/‎docs/index.rst‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/start/install.rst‎
Lines changed: 1 addition & 4 deletions b/‎docs/start/install.rst‎
Lines changed: 1 addition & 4 deletions
diff --git a/‎docs/start/more_resources.rst‎
Lines changed: 7 additions & 0 deletions b/‎docs/start/more_resources.rst‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎docs/start/quickstart.rst‎
Lines changed: 12 additions & 3 deletions b/‎docs/start/quickstart.rst‎
Lines changed: 12 additions & 3 deletions
diff --git a/‎examples/data_preprocess/geo3k.py‎
Lines changed: 2 additions & 1 deletion b/‎examples/data_preprocess/geo3k.py‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎examples/data_preprocess/geo3k_multiturn_w_tool.py‎
Lines changed: 2 additions & 1 deletion b/‎examples/data_preprocess/geo3k_multiturn_w_tool.py‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎examples/data_preprocess/gsm8k_multiturn_w_interaction.py‎
Lines changed: 5 additions & 1 deletion b/‎examples/data_preprocess/gsm8k_multiturn_w_interaction.py‎
Lines changed: 5 additions & 1 deletion
@@ -0,0 +1,85 @@
+# Contributing to verl
+
+Thank you for considering a contribution to verl! We welcome contributions of any kind - bug fixes, enhancements, documentation improvements, or even just feedback. Whether you're an experienced developer or this is your first open-source project, your help is invaluable.
+
+Your support can take many forms:
+- Report issues or unexpected behaviors.
+- Suggest or implement new features.
+- Improve or expand documentation.
+- Review pull requests and assist other contributors.
+- Spread the word: share verl in blog posts, social media, or give the repo a ⭐.
+
+## Finding Issues to Contribute
+
+Looking for ways to dive in? Check out these issues:
+- [Good first issues](https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22)
+- [Call for contribution](https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22call%20for%20contribution%22)
+Furthermore, you can learn the development plan and roadmap via [RFC](https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20label%3ARFC) and [Roadmap](https://github.com/volcengine/verl/issues?q=state%3Aopen%20label%3A%22roadmap%22).
+
+
+## Developing
+
+- **Python-only**: install verl via `pip install -e .[test,vllm]` or `pip install -e .[test,sglang]` and iterate quickly. For full dependency setup, check out the verl [installation doc](https://verl.readthedocs.io/en/latest/start/install.html).
+
+## Code Linting and Formatting
+
+We rely on pre-commit to keep our code consistent. To set it up:
+
+```bash
+pip install pre-commit
+pre-commit install
+# for staged changes
+pre-commit run
+# for all files in the repo
+# pre-commit run --all-files
+```
+
+## Testing
+
+Our test suites run on GitHub Actions. Check these workflows for details:
+- [GPU unit tests](https://github.com/volcengine/verl/blob/main/.github/workflows/gpu_unit_tests.yml)
+- [CPU unit tests](https://github.com/volcengine/verl/blob/main/.github/workflows/cpu_unit_tests.yml)
+- [vLLM tests](https://github.com/volcengine/verl/blob/main/.github/workflows/vllm.yml)
+- [SGLang tests](https://github.com/volcengine/verl/blob/main/.github/workflows/sgl.yml)
+
+### Adding CI tests
+
+If possible, please add CI test(s) for your new feature:
+
+1. Find the most relevant workflow yml file, which usually corresponds to a `hydra` default config (e.g. `ppo_trainer`, `ppo_megatron_trainer`, `sft_trainer`, etc).
+2. Add related path patterns to the `paths` section if not already included.
+3. Minimize the workload of the test script(s) (see existing scripts for examples).
+
+## Building the Docs
+```
+# Ensure verl is on your PYTHONPATH, e.g.:
+pip install -e .[test]
+
+# Install documentation dependencies
+pip install -r requirements-docs.txt
+
+# Generate HTML docs
+make clean
+make html
+
+# Preview locally
+python -m http.server -d _build/html/
+```
+Open your browser at http://localhost:8000 to explore the docs.
+
+## Pull Requests & Code Reviews
+
+Thanks for submitting a PR! To streamline reviews:
+- Follow our Pull Request Template for title format and checklist.
+- Adhere to our pre-commit lint rules and ensure all checks pass.
+- Update docs for any user-facing changes.
+- Add or update tests in the CI workflows, or explain why tests aren't applicable.
+
+## License
+
+See the [LICENSE](https://github.com/volcengine/verl/blob/main/LICENSE) file for full details.
+
+## Thank You
+
+We appreciate your contributions to verl. Your efforts help make the project stronger and more user-friendly. Happy coding!
+
@@ -228,32 +228,10 @@ verl is inspired by the design of Nemo-Aligner, Deepspeed-chat and OpenRLHF. The
 - [RACRO](https://github.com/gyhdog99/RACRO2): Build multi-modal reasoning models via decoupling it into query-conditioned captioning and text-only reasoning ![GitHub Repo stars](https://img.shields.io/github/stars/gyhdog99/RACRO2)
 
 and many more awesome work listed in [recipe](recipe/README.md).
-## Contribution Guide
-
-Contributions from the community are welcome! Please check out our [project roadmap](https://github.com/volcengine/verl/issues/710) and [good first issues](https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22) to see where you can contribute.
-
-### Code Linting and Formatting
-
-We use pre-commit to help improve code quality. To initialize pre-commit, run:
-
-```bash
-pip install pre-commit
-pre-commit install
-```
 
-To resolve CI errors locally, you can manually run pre-commit by:
-
-```bash
-pre-commit run
-```
-
-### Adding CI tests
-
-If possible, please add CI test(s) for your new feature:
+## Contribution Guide
 
-1. Find the most relevant workflow yml file, which usually corresponds to a `hydra` default config (e.g. `ppo_trainer`, `ppo_megatron_trainer`, `sft_trainer`, etc).
-2. Add related path patterns to the `paths` section if not already included.
-3. Minimize the workload of the test script(s) (see existing scripts for examples).
+See [contributions guide](CONTRIBUTING.md)
 
 ## About [ByteDance Seed Team](https://team.doubao.com/)
 
 
@@ -1,3 +1,5 @@
+.. _checkpoint-page:
+
 Using Checkpoints to Support Fault Tolerance Training
 =====================================================
 
@@ -28,12 +30,14 @@ So the inner checkpoint structure of **FSDP** is like:
     checkpoints/${trainer.project_name}/${trainer.experiment_name}
     ├── global_steps_${i}
     │   ├── actor
-    │   │   ├── huggingface     # default save config and tokenizer, save huggingface model if include ``hf_model`` in checkpoint.contents
+    │   │   ├── huggingface      # default save config and tokenizer, save huggingface model if include ``hf_model`` in checkpoint.contents
+    │   │   └── fsdp_config.json # FSDP config file, including world_size and fsdp version
     │   │   ├── model_world_size_{self.world_size}_rank_{self.rank}.pt
     │   │   ├── optim_world_size_{self.world_size}_rank_{self.rank}.pt
     │   │   └── extra_state_world_size_{self.world_size}_rank_{self.rank}.pt
     │   ├── critic
     │   │   ├── huggingface
+    │   │   └── fsdp_config.json
     │   │   ├── model_world_size_{self.world_size}_rank_{self.rank}.pt
     │   │   ├── optim_world_size_{self.world_size}_rank_{self.rank}.pt
     │   │   └── extra_state_world_size_{self.world_size}_rank_{self.rank}.pt
@@ -59,27 +63,26 @@ Convert FSDP and Megatron Checkpoints to HuggingFace Format Model
 -----------------------------------------------------------------
 
 We provide a tool to convert the FSDP and Megatron checkpoints to HuggingFace format model.
-The tool is located in ``verl/model_merger``.
+The tool is located in ``verl/model_merger``. For older versions of verl that don't include fsdp_config.json in checkpoints, you can use the legacy model merger located at ``verl/scripts/legacy_model_merger.py``.
 
 The script supports two main sub-commands: `merge` (to convert and save checkpoints) and `test` (to validate merged checkpoints against a reference model).
 The arguments for the `merge` sub-command are as follows:
 
 .. code:: bash
 
-    usage: python -m verl.model_merger merge [-h] --backend {fsdp,megatron} --local_dir LOCAL_DIR [--hf_model_path HF_MODEL_PATH]
-                                [--tie-word-embedding] [--is-value-model] [--target_dir TARGET_DIR]
-                                [--hf_upload_path HF_UPLOAD_PATH] [--private]
+    usage: python -m verl.model_merger merge [-h] --backend {fsdp,megatron} [--local_dir LOCAL_DIR] [--tie-word-embedding] [--is-value-model] [--use_cpu_initialization] [--target_dir TARGET_DIR]
+                         [--hf_upload_path HF_UPLOAD_PATH] [--private]
 
     options:
     -h, --help            show this help message and exit
     --backend {fsdp,megatron}
                             The backend of the model
     --local_dir LOCAL_DIR
                             Path to the saved model checkpoints
-    --hf_model_path HF_MODEL_PATH
-                            (Deprecated) Path to the original Hugging Face model for config.
     --tie-word-embedding  Whether to tie word embedding weights (currently only Megatron supported)
     --is-value-model      Whether the model is a value model (currently only Megatron supported)
+    --use_cpu_initialization
+                            Whether to use CPU initialization for the model. This is useful for large models that cannot fit into GPU memory during initialization.
     --target_dir TARGET_DIR
                             Directory to save the merged huggingface model
     --hf_upload_path HF_UPLOAD_PATH
 
@@ -32,6 +32,7 @@ verl is fast with:
    start/quickstart
    start/multinode
    start/ray_debug_tutorial
+   start/more_resources
 
 .. toctree::
    :maxdepth: 2
 
@@ -21,15 +21,12 @@ We recommend using **FSDP** backend to investigate, research and prototype diffe
 
 For users who pursue better scalability, we recommend using **Megatron-LM** backend. Currently, we support `Megatron-LM v0.11 <https://github.com/NVIDIA/Megatron-LM/tree/v0.11.0>`_. The guide for using Megatron-LM backend can be found in :doc:`Megatron-LM Workers<../workers/megatron_workers>`.
 
-.. note:: 
-
-    verl directly supports megatron's `GPTModel` API on the main branch with mcore v0.11. For mcore v0.4 try `0.3.x branch <https://github.com/volcengine/verl/tree/v0.3.x>`_ instead.
 
 2. Inference:
 
 For inference, vllm 0.8.3 and later versions have been tested for stability. We recommend turning on env var `VLLM_USE_V1=1` for optimal performance.
 
-For SGLang, refer to the :doc:`SGLang Backend<../workers/sglang_worker>` for detailed installation and usage instructions. **SGLang offers better throughput and is under extensive development.** We encourage users to report any issues or provide feedback via the `SGLang Issue Tracker <https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/106>`_.
+For SGLang, refer to the :doc:`SGLang Backend<../workers/sglang_worker>` for detailed installation and usage instructions. SGLang rollout is under extensive development and offers many advanced features and optimizations. We encourage users to report any issues or provide feedback via the `SGLang Issue Tracker <https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/106>`_.
 
 For huggingface TGI integration, it is usually used for debugging and single GPU exploration.
 
 
@@ -0,0 +1,7 @@
+More Resources
+==============
+
+Last updated: 06/30/2025.
+
+- Introduction to verl (`Slides <https://tongyx361.github.io/blogs/posts/verl-intro>`_)
+- verl Code Walkthrough (`Slides <https://tongyx361.github.io/blogs/posts/verl-tutorial>`_, `Talk in Chinese <https://hcqnc.xetlk.com/sl/3vACOK>`_) 
@@ -70,15 +70,15 @@ answer from both the solution and model's output using regular
 expression matching. We assign a reward of 1 to correct
 answer, 0.0 to incorrect answer and 0 to no answer. 
 
-For more details, please refer to `verl/utils/reward_score/gsm8k.py <https://github.com/volcengine/verl/blob/v0.1/verl/utils/reward_score/gsm8k.py>`_.
+For more details, please refer to `verl/utils/reward_score/gsm8k.py <https://github.com/volcengine/verl/blob/v0.4.1/verl/utils/reward_score/gsm8k.py>`_.
 
 **Training Script**
 
 Now let's run PPO training with the dataset and model above. [2]_
 
 
 Set the ``data.train_files`` ,\ ``data.val_files``, ``actor_rollout_ref.model.path`` and ``critic.model.path`` based on your dataset and model names or paths.
-You may set ``VERL_USE_MODELSCOPE=True`` to download models from modelscope instead of huggingface.
+You may set ``VERL_USE_MODELSCOPE=True`` to download models from `modelscope <https://www.modelscope.cn>`_ instead of `huggingface <https://huggingface.co>`_.
 
 .. code-block:: bash
 
@@ -118,7 +118,16 @@ You are expected to see the following logs, indicating training in progress. The
 
 Checkout :ref:`algo-baseline-page` for full training and validation logs for reference.
 
-The checkpoint is saved at the following dir by default: ``checkpoints/${trainer.project_name}/${trainer.experiment_name}``
+The checkpoint is saved at the following dir by default: ``checkpoints/${trainer.project_name}/${trainer.experiment_name}``. You can merge the saved checkpoints to huggingface model using ``verl.model_merger`` module, for example:
+
+.. code-block:: bash
+
+    python3 -m verl.model_merger merge \
+        --backend fsdp \
+        --local_dir checkpoints/${trainer.project_name}/${trainer.experiment_name}/global_step_1/actor \
+        --target_dir checkpoints/${trainer.project_name}/${trainer.experiment_name}/global_step_1/actor/huggingface
+
+For more details about checkpoint and model merging, please refer to :ref:`checkpoint-page`.
 
 To enable ``wandb`` for experiment tracking, set the following configs:
 
 
@@ -38,7 +38,8 @@
 
     instruction_following = (
         r"You FIRST think about the reasoning process as an internal monologue and then provide the final answer. "
-        r"The reasoning process MUST BE enclosed within <think> </think> tags. The final answer MUST BE put in \boxed{}."
+        r"The reasoning process MUST BE enclosed within <think> </think> tags. "
+        r"The final answer MUST BE put in \boxed{}."
     )
 
     # add a row to each data item that represents a unique id
 
@@ -37,7 +37,8 @@
     test_dataset = dataset["test"]
     instruction_following = (
         r"You FIRST think about the reasoning process as an internal monologue and then provide the final answer. "
-        r"The reasoning process MUST BE enclosed within <think> </think> tags. The final answer MUST BE put in \boxed{}."
+        r"The reasoning process MUST BE enclosed within <think> </think> tags. "
+        r"The final answer MUST BE put in \boxed{}."
     )
 
     # add a row to each data item that represents a unique id
 
@@ -63,7 +63,11 @@ def process_fn(example, idx):
                 "prompt": [
                     {
                         "role": "system",
-                        "content": ("You are a math expert. You are given a question and you need to solve it step by step. You should rethinking carefully if user point out your answer is wrong. Put your final answer in the format of `#### <answer>`."),
+                        "content": (
+                            "You are a math expert. You are given a question and you need to solve it step by step. "
+                            "You should rethinking carefully if user point out your answer is wrong. "
+                            "Put your final answer in the format of `#### <answer>`."
+                        ),
                     },
                     {
                         "role": "user",
Original file line number	Diff line number	Diff line change
`@@ -38,7 +38,8 @@`
`38`	`38`
`39`	`39`	`instruction_following = (`
`40`	`40`	`r"You FIRST think about the reasoning process as an internal monologue and then provide the final answer. "`
`41`		`- r"The reasoning process MUST BE enclosed within <think> </think> tags. The final answer MUST BE put in \boxed{}."`
	`41`	`+ r"The reasoning process MUST BE enclosed within <think> </think> tags. "`
	`42`	`+ r"The final answer MUST BE put in \boxed{}."`
`42`	`43`	`)`
`43`	`44`
`44`	`45`	`# add a row to each data item that represents a unique id`
Original file line number	Diff line number	Diff line change
`@@ -37,7 +37,8 @@`
`37`	`37`	`test_dataset = dataset["test"]`
`38`	`38`	`instruction_following = (`
`39`	`39`	`r"You FIRST think about the reasoning process as an internal monologue and then provide the final answer. "`
`40`		`- r"The reasoning process MUST BE enclosed within <think> </think> tags. The final answer MUST BE put in \boxed{}."`
	`40`	`+ r"The reasoning process MUST BE enclosed within <think> </think> tags. "`
	`41`	`+ r"The final answer MUST BE put in \boxed{}."`
`41`	`42`	`)`
`42`	`43`
`43`	`44`	`# add a row to each data item that represents a unique id`
Original file line number	Diff line number	Diff line change
`@@ -63,7 +63,11 @@ def process_fn(example, idx):`
`63`	`63`	`"prompt": [`
`64`	`64`	`{`
`65`	`65`	`"role": "system",`
`66`		- "content": ("You are a math expert. You are given a question and you need to solve it step by step. You should rethinking carefully if user point out your answer is wrong. Put your final answer in the format of `#### <answer>`."),
	`66`	`+ "content": (`
	`67`	`+ "You are a math expert. You are given a question and you need to solve it step by step. "`
	`68`	`+ "You should rethinking carefully if user point out your answer is wrong. "`
	`69`	+ "Put your final answer in the format of `#### <answer>`."
	`70`	`+ ),`
`67`	`71`	`},`
`68`	`72`	`{`
`69`	`73`	`"role": "user",`