[trainer, data] feat: Dynamic Data Generation by jwong8314 · Pull Request #2312 · verl-project/verl

jwong8314 · 2025-07-01T18:28:54Z

What does this PR do?

Add interface to support dynamic data generation which will allow us to create new tasks between each step of training.

To elaborate, this PR is refactoring the code and providing an interface to make it easier to implement other dynamic data generation algorithms. In particular, we want to have the model propose new tasks based on which tasks currently do or don't succeed. This has been shown to be useful for webtasks and reasoning: https://arxiv.org/pdf/2506.14205, https://openreview.net/pdf?id=oVKEAFjEqv, https://arxiv.org/abs/2502.06776, https://arxiv.org/pdf/2505.03335.

Basic example that could be useful:

Imagine wanting to generate variations on the hardest tasks for the current training loop. We implement this as a LLM API call as a custom data generator followed by a custom sampler that selects the desirable datapoints as they're generated.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: is:pr is:open data generation
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.sh

more details in Usage Example section below.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

Change the yaml to enable

--- a/verl/trainer/config/ppo_trainer.yaml
+++ b/verl/trainer/config/ppo_trainer.yaml
@@ -93,11 +93,11 @@ data:
 
     # The path to the file containing your customized data generation class.
     # E.g. 'verl.utils.dataset.datagen'
-    path: null 
+    path: 'verl.utils.dataset.datagen'
 
     # The class name of the data generation class within the specified file.
     # E.g. NoOpDataGen
-    name: null 
+    name: 'NoOpDataGen'

The noop dataset just reappends the first datapoint at the end. You can see that this correctly happened by printing out the size of the dataset each epoch:

(TaskRunner pid=71298) step:0 - val-core/openai/gsm8k/reward/mean@1:0.668
(TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset.
Training Progress:   0%|          | 0/435 [00:00<?, ?it/s]
(WorkerDict pid=74307) /workplace/rl_workspace/src/AGIEmergeRL/vendor/verl_2/verl/verl/workers/rollout/sglang_rollout/utils.py:49: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.) [repeated 3x across cluster]
(WorkerDict pid=74307)   tensor_data = torch.ByteTensor(np.frombuffer(serialized_data, dtype=np.uint8)).to(device) [repeated 3x across cluster]
(TaskRunner pid=71298) filter dataset len: 1
(TaskRunner pid=71298) new dataset len: 7474
(TaskRunner pid=71298) 
Filtering prompts longer than 1024 tokens: 100%|██████████| 1/1 [00:00<00:00, 165.34 examples/s]
(TaskRunner pid=71298) 7474
(TaskRunner pid=71298) step:1 - global_seqlen/min:88786.000 - global_seqlen/max:101138.000 - global_seqlen/minmax_diff:12352.000 - global_seqlen/balanced_min:94905.000 - global_seqlen/balanced_max:94905.000 - global_seqlen/mean:94905.000 - actor/entropy:0.361 - actor/kl_loss:0.002 - actor/kl_coef:0.001 - actor/pg_loss:0.022 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:1.301 - perf/mfu/actor:0.107 - perf/max_memory_allocated_gb:7.201 - perf/max_memory_reserved_gb:12.896 - perf/cpu_memory_used_gb:57.490 - actor/lr:0.000 - training/global_step:1.000 - training/epoch:0.000 - critic/score/mean:0.677 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.677 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.021 - critic/advantages/max:1.500 - critic/advantages/min:-1.500 - critic/returns/mean:-0.021 - critic/returns/max:1.500 - critic/returns/min:-1.500 - response_length/mean:376.086 - response_length/max:1024.000 - response_length/min:58.000 - response_length/clip_ratio:0.020 - prompt_length/mean:365.359 - prompt_length/max:459.000 - prompt_length/min:327.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:44.158 - timing_s/reshard:2.658 - timing_s/gen:47.161 - timing_s/reward:0.423 - timing_s/old_log_prob:15.347 - timing_s/ref:28.668 - timing_s/adv:0.039 - timing_s/update_actor:60.185 - timing_s/step:151.945 - timing_per_token_ms/gen:0.122 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.038 - timing_per_token_ms/update_actor:0.079 - perf/total_num_tokens:759240.000 - perf/time_per_step:151.945 - perf/throughput:624.599
(TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset.
Training Progress:   0%|          | 1/435 [02:32<18:24:31, 152.70s/it]
(TaskRunner pid=71298) filter dataset len: 1
(TaskRunner pid=71298) new dataset len: 7475

Note the original dataset length is 7473 for gsm8k_w_tool

High-Level Design

Demonstrate the high-level design if this PR is complex.

n/a

Specific Changes

List the specific changes.

Add an abstract datagen class that's used in ray_trainer.py to add data to the dataset
We refactor filtering out of _read_files_and_tokenize in RLHFDataset
We add append_dataframe to RLHFDataset
Add util for getting type from file.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace.

…to the batch during training Co-authored-by: Justin Wong <wong.justin@berkeley.edu> Co-authored-by: Frederick Robinson <frederick.robinson@frrad.com>

verl/utils/dataset/rl_dataset.py

verl/utils/dataset/datagen.py

verl/utils/dataset/rl_dataset.py

verl/utils/dataset/datagen.py

jwong8314 · 2025-07-01T20:35:11Z

Is there a way to unblock CI workflows while discussion are still ongoing?

zhaochenyang20 · 2025-07-02T00:10:34Z

Is there a way to unblock CI workflows while discussion are still ongoing?

No. So have to rerun a lot of time. 🥲

zhaochenyang20 · 2025-07-02T00:12:02Z

Hey Justin, could you elaborate more on this PR about what it is aiming at, especially what's Dynamic Data Generation?

I can help to contact verl team and see their feedback, but it's better on our side to make it well-defined? thanks!!

jwong8314 · 2025-07-02T03:32:04Z

To elaborate, this PR is refactoring the code and providing an interface to make it easier to implement other dynamic data generation algorithms. In particular, we want to have the model propose new tasks based on which tasks currently do or don't succeed.

jwong8314 · 2025-07-02T08:03:50Z

Let me know if you have additional questions! It looks the the CI passed and it's ready to merge.

verl/utils/import_utils.py

verl/trainer/ppo/ray_trainer.py

gemini-code-assist · 2025-07-04T00:48:36Z

Important

Installation incomplete: to start using Gemini Code Assist, please ask the organization owner(s) to visit the Gemini Code Assist Admin Console and sign the Terms of Services.

Dynamic dataset

verl/trainer/ppo/ray_trainer.py

eric-haibin-lin · 2025-07-04T18:47:29Z

Do you intend to support dataloader save/resume with dynamic datagen?

jwong8314 · 2025-07-05T13:34:00Z

Although we currently do not support dataloader save and resume, this can be added in the future.

jwong8314 · 2025-07-08T17:46:47Z

I'd additionally like to highlight that custom dataset class won't be sufficient without the flags added in this PR to distinguish train vs val datasets. Only the train dataset should be allowed to generate new training datapoints and the val dataset should remain fixed.

I do not think it's necessary to keep the DynamicGenDataset in verl. it can be in your private recipe repo since verl already provides data.custom_cls to take any custom dataset class.

verl/utils/dataset/dynamicgen_dataset.py

verl/trainer/ppo/ray_trainer.py

eric-haibin-lin

ok with merging as soon as all tests pass

### What does this PR do? Add interface to support dynamic data generation which will allow us to create new tasks between each step of training. To elaborate, this PR is refactoring the code and providing an interface to make it easier to implement other dynamic data generation algorithms. In particular, we want to have the model propose new tasks based on which tasks currently do or don't succeed. This has been shown to be useful for webtasks and reasoning: https://arxiv.org/pdf/2506.14205, https://openreview.net/pdf?id=oVKEAFjEqv, https://arxiv.org/abs/2502.06776, https://arxiv.org/pdf/2505.03335. Basic example that could be useful: Imagine wanting to generate variations on the hardest tasks for the current training loop. We implement this as a LLM API call as a custom data generator followed by a custom sampler that selects the desirable datapoints as they're generated. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: is:pr is:open data generation - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. `bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.sh` more details in Usage Example section below. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. 1. Change the yaml to enable ``` --- a/verl/trainer/config/ppo_trainer.yaml +++ b/verl/trainer/config/ppo_trainer.yaml @@ -93,11 +93,11 @@ data: # The path to the file containing your customized data generation class. # E.g. 'verl.utils.dataset.datagen' - path: null + path: 'verl.utils.dataset.datagen' # The class name of the data generation class within the specified file. # E.g. NoOpDataGen - name: null + name: 'NoOpDataGen' ``` The noop dataset just reappends the first datapoint at the end. You can see that this correctly happened by printing out the size of the dataset each epoch: ``` (TaskRunner pid=71298) step:0 - val-core/openai/gsm8k/reward/mean@1:0.668 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 0/435 [00:00<?, ?it/s] (WorkerDict pid=74307) /workplace/rl_workspace/src/AGIEmergeRL/vendor/verl_2/verl/verl/workers/rollout/sglang_rollout/utils.py:49: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.) [repeated 3x across cluster] (WorkerDict pid=74307) tensor_data = torch.ByteTensor(np.frombuffer(serialized_data, dtype=np.uint8)).to(device) [repeated 3x across cluster] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7474 (TaskRunner pid=71298) Filtering prompts longer than 1024 tokens: 100%|██████████| 1/1 [00:00<00:00, 165.34 examples/s] (TaskRunner pid=71298) 7474 (TaskRunner pid=71298) step:1 - global_seqlen/min:88786.000 - global_seqlen/max:101138.000 - global_seqlen/minmax_diff:12352.000 - global_seqlen/balanced_min:94905.000 - global_seqlen/balanced_max:94905.000 - global_seqlen/mean:94905.000 - actor/entropy:0.361 - actor/kl_loss:0.002 - actor/kl_coef:0.001 - actor/pg_loss:0.022 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:1.301 - perf/mfu/actor:0.107 - perf/max_memory_allocated_gb:7.201 - perf/max_memory_reserved_gb:12.896 - perf/cpu_memory_used_gb:57.490 - actor/lr:0.000 - training/global_step:1.000 - training/epoch:0.000 - critic/score/mean:0.677 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.677 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.021 - critic/advantages/max:1.500 - critic/advantages/min:-1.500 - critic/returns/mean:-0.021 - critic/returns/max:1.500 - critic/returns/min:-1.500 - response_length/mean:376.086 - response_length/max:1024.000 - response_length/min:58.000 - response_length/clip_ratio:0.020 - prompt_length/mean:365.359 - prompt_length/max:459.000 - prompt_length/min:327.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:44.158 - timing_s/reshard:2.658 - timing_s/gen:47.161 - timing_s/reward:0.423 - timing_s/old_log_prob:15.347 - timing_s/ref:28.668 - timing_s/adv:0.039 - timing_s/update_actor:60.185 - timing_s/step:151.945 - timing_per_token_ms/gen:0.122 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.038 - timing_per_token_ms/update_actor:0.079 - perf/total_num_tokens:759240.000 - perf/time_per_step:151.945 - perf/throughput:624.599 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 1/435 [02:32<18:24:31, 152.70s/it] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7475 ``` Note the original dataset length is 7473 for `gsm8k_w_tool` ### High-Level Design > Demonstrate the high-level design if this PR is complex. n/a ### Specific Changes > List the specific changes. - Add an abstract datagen class that's used in ray_trainer.py to add data to the dataset - We refactor filtering out of `_read_files_and_tokenize` in RLHFDataset - We add `append_dataframe` to RLHFDataset - Add util for getting type from file. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Frederick Robinson <frederick.robinson@frrad.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>

ASchneidman · 2025-09-24T00:11:14Z

I am fairly confident this implementation is incorrect. Data loader workers are on separate processes, and thus have their own in memory copies of the dataframe. Modifications to the dataframe after each batch will not reach the dataloader workers. Additionally, the dataloader will not see the change to the size of the dataset, so it will not get the new samples.

### What does this PR do? Add interface to support dynamic data generation which will allow us to create new tasks between each step of training. To elaborate, this PR is refactoring the code and providing an interface to make it easier to implement other dynamic data generation algorithms. In particular, we want to have the model propose new tasks based on which tasks currently do or don't succeed. This has been shown to be useful for webtasks and reasoning: https://arxiv.org/pdf/2506.14205, https://openreview.net/pdf?id=oVKEAFjEqv, https://arxiv.org/abs/2502.06776, https://arxiv.org/pdf/2505.03335. Basic example that could be useful: Imagine wanting to generate variations on the hardest tasks for the current training loop. We implement this as a LLM API call as a custom data generator followed by a custom sampler that selects the desirable datapoints as they're generated. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: is:pr is:open data generation - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. `bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.sh` more details in Usage Example section below. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. 1. Change the yaml to enable ``` --- a/verl/trainer/config/ppo_trainer.yaml +++ b/verl/trainer/config/ppo_trainer.yaml @@ -93,11 +93,11 @@ data: # The path to the file containing your customized data generation class. # E.g. 'verl.utils.dataset.datagen' - path: null + path: 'verl.utils.dataset.datagen' # The class name of the data generation class within the specified file. # E.g. NoOpDataGen - name: null + name: 'NoOpDataGen' ``` The noop dataset just reappends the first datapoint at the end. You can see that this correctly happened by printing out the size of the dataset each epoch: ``` (TaskRunner pid=71298) step:0 - val-core/openai/gsm8k/reward/mean@1:0.668 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 0/435 [00:00<?, ?it/s] (WorkerDict pid=74307) /workplace/rl_workspace/src/AGIEmergeRL/vendor/verl_2/verl/verl/workers/rollout/sglang_rollout/utils.py:49: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.) [repeated 3x across cluster] (WorkerDict pid=74307) tensor_data = torch.ByteTensor(np.frombuffer(serialized_data, dtype=np.uint8)).to(device) [repeated 3x across cluster] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7474 (TaskRunner pid=71298) Filtering prompts longer than 1024 tokens: 100%|██████████| 1/1 [00:00<00:00, 165.34 examples/s] (TaskRunner pid=71298) 7474 (TaskRunner pid=71298) step:1 - global_seqlen/min:88786.000 - global_seqlen/max:101138.000 - global_seqlen/minmax_diff:12352.000 - global_seqlen/balanced_min:94905.000 - global_seqlen/balanced_max:94905.000 - global_seqlen/mean:94905.000 - actor/entropy:0.361 - actor/kl_loss:0.002 - actor/kl_coef:0.001 - actor/pg_loss:0.022 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:1.301 - perf/mfu/actor:0.107 - perf/max_memory_allocated_gb:7.201 - perf/max_memory_reserved_gb:12.896 - perf/cpu_memory_used_gb:57.490 - actor/lr:0.000 - training/global_step:1.000 - training/epoch:0.000 - critic/score/mean:0.677 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.677 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.021 - critic/advantages/max:1.500 - critic/advantages/min:-1.500 - critic/returns/mean:-0.021 - critic/returns/max:1.500 - critic/returns/min:-1.500 - response_length/mean:376.086 - response_length/max:1024.000 - response_length/min:58.000 - response_length/clip_ratio:0.020 - prompt_length/mean:365.359 - prompt_length/max:459.000 - prompt_length/min:327.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:44.158 - timing_s/reshard:2.658 - timing_s/gen:47.161 - timing_s/reward:0.423 - timing_s/old_log_prob:15.347 - timing_s/ref:28.668 - timing_s/adv:0.039 - timing_s/update_actor:60.185 - timing_s/step:151.945 - timing_per_token_ms/gen:0.122 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.038 - timing_per_token_ms/update_actor:0.079 - perf/total_num_tokens:759240.000 - perf/time_per_step:151.945 - perf/throughput:624.599 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 1/435 [02:32<18:24:31, 152.70s/it] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7475 ``` Note the original dataset length is 7473 for `gsm8k_w_tool` ### High-Level Design > Demonstrate the high-level design if this PR is complex. n/a ### Specific Changes > List the specific changes. - Add an abstract datagen class that's used in ray_trainer.py to add data to the dataset - We refactor filtering out of `_read_files_and_tokenize` in RLHFDataset - We add `append_dataframe` to RLHFDataset - Add util for getting type from file. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Frederick Robinson <frederick.robinson@frrad.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>

jwong8314 and others added 6 commits July 1, 2025 17:00

add support for custom datagen class that allows for adding new data …

4a4dbc0

…to the batch during training Co-authored-by: Justin Wong <wong.justin@berkeley.edu> Co-authored-by: Frederick Robinson <frederick.robinson@frrad.com>

ruff

9d599ce

ruff-format

d91b626

ruff-format

84b3815

Update license

3c1cf80

update license

9c65168

jwong8314 requested review from PeterSH6, eric-haibin-lin, tongyx361 and vermouth1992 as code owners July 1, 2025 18:28

zhaochenyang20 reviewed Jul 1, 2025

View reviewed changes

verl/utils/dataset/rl_dataset.py Show resolved Hide resolved

verl/utils/dataset/datagen.py Outdated Show resolved Hide resolved

fix: make sure if there's not data_generatore it doesn't crash

8a04aca

zhaochenyang20 suggested changes Jul 1, 2025

View reviewed changes

ruff-format

87b89d0

jwong8314 requested a review from zhaochenyang20 July 1, 2025 21:57

zhaochenyang20 approved these changes Jul 2, 2025

View reviewed changes

Merge branch 'main' into main

732b184

eric-haibin-lin reviewed Jul 4, 2025

View reviewed changes

verl/utils/import_utils.py Show resolved Hide resolved

verl/trainer/ppo/ray_trainer.py Outdated Show resolved Hide resolved

jwong8314 closed this Jul 4, 2025

jwong8314 force-pushed the main branch from 732b184 to 212d814 Compare July 4, 2025 00:48

jwong8314 added 4 commits July 3, 2025 17:50

Merge branch 'main' into dynamic_dataset

ffba50d

undo change to import_utils

c620bcb

merging into dataset

6b061f9

Merge pull request #1 from jwong8314/dynamic_dataset

13debde

Dynamic dataset

Merge branch 'main' into main

0a5cabf

eric-haibin-lin requested changes Jul 4, 2025

View reviewed changes

verl/trainer/ppo/ray_trainer.py Show resolved Hide resolved

jwong8314 requested a review from eric-haibin-lin July 7, 2025 06:28

add parameter for batch information

4aac878

eric-haibin-lin reviewed Jul 8, 2025

View reviewed changes

verl/utils/dataset/dynamicgen_dataset.py Outdated Show resolved Hide resolved

verl/utils/dataset/dynamicgen_dataset.py Show resolved Hide resolved

verl/trainer/ppo/ray_trainer.py Show resolved Hide resolved

jwong8314 added 3 commits July 8, 2025 21:44

add comments and placed files in experimental

e3bbd57

move to experimental subdir

122e817

ruff

a74ff75

eric-haibin-lin approved these changes Jul 8, 2025

View reviewed changes

jwong8314 added 6 commits July 8, 2025 21:59

ruff

2e44ead

Merge branch 'volcengine:main' into main

6126b96

patch CI

16647d4

Merge branch 'main' into main

171c9be

resolve conflicts new yaml

314d350

typo

4bd1452

zhaochenyang20 merged commit ab11fff into verl-project:main Jul 9, 2025
51 of 52 checks passed

frrad mentioned this pull request Jul 23, 2025

[RFC] Add persistable replay buffer for large-scale rollout data storage #2539

Open

Conversation

jwong8314 commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

High-Level Design

Specific Changes

Checklist Before Submitting

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jwong8314 commented Jul 1, 2025

Uh oh!

zhaochenyang20 commented Jul 2, 2025

Uh oh!

zhaochenyang20 commented Jul 2, 2025

Uh oh!

jwong8314 commented Jul 2, 2025

Uh oh!

jwong8314 commented Jul 2, 2025

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot commented Jul 4, 2025

Uh oh!

Uh oh!

eric-haibin-lin commented Jul 4, 2025

Uh oh!

jwong8314 commented Jul 5, 2025

Uh oh!

jwong8314 commented Jul 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eric-haibin-lin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ASchneidman commented Sep 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jwong8314 commented Jul 1, 2025 •

edited

Loading