[trainer, data] feat: Dynamic Data Generation #2312
[trainer, data] feat: Dynamic Data Generation #2312zhaochenyang20 merged 30 commits intoverl-project:mainfrom
Conversation
…to the batch during training Co-authored-by: Justin Wong <wong.justin@berkeley.edu> Co-authored-by: Frederick Robinson <frederick.robinson@frrad.com>
|
Is there a way to unblock CI workflows while discussion are still ongoing? |
No. So have to rerun a lot of time. 🥲 |
|
Hey Justin, could you elaborate more on this PR about what it is aiming at, especially what's Dynamic Data Generation? I can help to contact verl team and see their feedback, but it's better on our side to make it well-defined? thanks!! |
|
To elaborate, this PR is refactoring the code and providing an interface to make it easier to implement other dynamic data generation algorithms. In particular, we want to have the model propose new tasks based on which tasks currently do or don't succeed. |
|
Let me know if you have additional questions! It looks the the CI passed and it's ready to merge. |
|
Important Installation incomplete: to start using Gemini Code Assist, please ask the organization owner(s) to visit the Gemini Code Assist Admin Console and sign the Terms of Services. |
|
Do you intend to support dataloader save/resume with dynamic datagen? |
|
Although we currently do not support dataloader save and resume, this can be added in the future. |
|
I'd additionally like to highlight that custom dataset class won't be sufficient without the flags added in this PR to distinguish train vs val datasets. Only the train dataset should be allowed to generate new training datapoints and the val dataset should remain fixed.
|
eric-haibin-lin
left a comment
There was a problem hiding this comment.
ok with merging as soon as all tests pass
### What does this PR do? Add interface to support dynamic data generation which will allow us to create new tasks between each step of training. To elaborate, this PR is refactoring the code and providing an interface to make it easier to implement other dynamic data generation algorithms. In particular, we want to have the model propose new tasks based on which tasks currently do or don't succeed. This has been shown to be useful for webtasks and reasoning: https://arxiv.org/pdf/2506.14205, https://openreview.net/pdf?id=oVKEAFjEqv, https://arxiv.org/abs/2502.06776, https://arxiv.org/pdf/2505.03335. Basic example that could be useful: Imagine wanting to generate variations on the hardest tasks for the current training loop. We implement this as a LLM API call as a custom data generator followed by a custom sampler that selects the desirable datapoints as they're generated. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: is:pr is:open data generation - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. `bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.sh` more details in Usage Example section below. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. 1. Change the yaml to enable ``` --- a/verl/trainer/config/ppo_trainer.yaml +++ b/verl/trainer/config/ppo_trainer.yaml @@ -93,11 +93,11 @@ data: # The path to the file containing your customized data generation class. # E.g. 'verl.utils.dataset.datagen' - path: null + path: 'verl.utils.dataset.datagen' # The class name of the data generation class within the specified file. # E.g. NoOpDataGen - name: null + name: 'NoOpDataGen' ``` The noop dataset just reappends the first datapoint at the end. You can see that this correctly happened by printing out the size of the dataset each epoch: ``` (TaskRunner pid=71298) step:0 - val-core/openai/gsm8k/reward/mean@1:0.668 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 0/435 [00:00<?, ?it/s] (WorkerDict pid=74307) /workplace/rl_workspace/src/AGIEmergeRL/vendor/verl_2/verl/verl/workers/rollout/sglang_rollout/utils.py:49: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.) [repeated 3x across cluster] (WorkerDict pid=74307) tensor_data = torch.ByteTensor(np.frombuffer(serialized_data, dtype=np.uint8)).to(device) [repeated 3x across cluster] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7474 (TaskRunner pid=71298) Filtering prompts longer than 1024 tokens: 100%|██████████| 1/1 [00:00<00:00, 165.34 examples/s] (TaskRunner pid=71298) 7474 (TaskRunner pid=71298) step:1 - global_seqlen/min:88786.000 - global_seqlen/max:101138.000 - global_seqlen/minmax_diff:12352.000 - global_seqlen/balanced_min:94905.000 - global_seqlen/balanced_max:94905.000 - global_seqlen/mean:94905.000 - actor/entropy:0.361 - actor/kl_loss:0.002 - actor/kl_coef:0.001 - actor/pg_loss:0.022 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:1.301 - perf/mfu/actor:0.107 - perf/max_memory_allocated_gb:7.201 - perf/max_memory_reserved_gb:12.896 - perf/cpu_memory_used_gb:57.490 - actor/lr:0.000 - training/global_step:1.000 - training/epoch:0.000 - critic/score/mean:0.677 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.677 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.021 - critic/advantages/max:1.500 - critic/advantages/min:-1.500 - critic/returns/mean:-0.021 - critic/returns/max:1.500 - critic/returns/min:-1.500 - response_length/mean:376.086 - response_length/max:1024.000 - response_length/min:58.000 - response_length/clip_ratio:0.020 - prompt_length/mean:365.359 - prompt_length/max:459.000 - prompt_length/min:327.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:44.158 - timing_s/reshard:2.658 - timing_s/gen:47.161 - timing_s/reward:0.423 - timing_s/old_log_prob:15.347 - timing_s/ref:28.668 - timing_s/adv:0.039 - timing_s/update_actor:60.185 - timing_s/step:151.945 - timing_per_token_ms/gen:0.122 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.038 - timing_per_token_ms/update_actor:0.079 - perf/total_num_tokens:759240.000 - perf/time_per_step:151.945 - perf/throughput:624.599 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 1/435 [02:32<18:24:31, 152.70s/it] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7475 ``` Note the original dataset length is 7473 for `gsm8k_w_tool` ### High-Level Design > Demonstrate the high-level design if this PR is complex. n/a ### Specific Changes > List the specific changes. - Add an abstract datagen class that's used in ray_trainer.py to add data to the dataset - We refactor filtering out of `_read_files_and_tokenize` in RLHFDataset - We add `append_dataframe` to RLHFDataset - Add util for getting type from file. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Frederick Robinson <frederick.robinson@frrad.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>
### What does this PR do? Add interface to support dynamic data generation which will allow us to create new tasks between each step of training. To elaborate, this PR is refactoring the code and providing an interface to make it easier to implement other dynamic data generation algorithms. In particular, we want to have the model propose new tasks based on which tasks currently do or don't succeed. This has been shown to be useful for webtasks and reasoning: https://arxiv.org/pdf/2506.14205, https://openreview.net/pdf?id=oVKEAFjEqv, https://arxiv.org/abs/2502.06776, https://arxiv.org/pdf/2505.03335. Basic example that could be useful: Imagine wanting to generate variations on the hardest tasks for the current training loop. We implement this as a LLM API call as a custom data generator followed by a custom sampler that selects the desirable datapoints as they're generated. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: is:pr is:open data generation - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. `bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.sh` more details in Usage Example section below. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. 1. Change the yaml to enable ``` --- a/verl/trainer/config/ppo_trainer.yaml +++ b/verl/trainer/config/ppo_trainer.yaml @@ -93,11 +93,11 @@ data: # The path to the file containing your customized data generation class. # E.g. 'verl.utils.dataset.datagen' - path: null + path: 'verl.utils.dataset.datagen' # The class name of the data generation class within the specified file. # E.g. NoOpDataGen - name: null + name: 'NoOpDataGen' ``` The noop dataset just reappends the first datapoint at the end. You can see that this correctly happened by printing out the size of the dataset each epoch: ``` (TaskRunner pid=71298) step:0 - val-core/openai/gsm8k/reward/mean@1:0.668 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 0/435 [00:00<?, ?it/s] (WorkerDict pid=74307) /workplace/rl_workspace/src/AGIEmergeRL/vendor/verl_2/verl/verl/workers/rollout/sglang_rollout/utils.py:49: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.) [repeated 3x across cluster] (WorkerDict pid=74307) tensor_data = torch.ByteTensor(np.frombuffer(serialized_data, dtype=np.uint8)).to(device) [repeated 3x across cluster] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7474 (TaskRunner pid=71298) Filtering prompts longer than 1024 tokens: 100%|██████████| 1/1 [00:00<00:00, 165.34 examples/s] (TaskRunner pid=71298) 7474 (TaskRunner pid=71298) step:1 - global_seqlen/min:88786.000 - global_seqlen/max:101138.000 - global_seqlen/minmax_diff:12352.000 - global_seqlen/balanced_min:94905.000 - global_seqlen/balanced_max:94905.000 - global_seqlen/mean:94905.000 - actor/entropy:0.361 - actor/kl_loss:0.002 - actor/kl_coef:0.001 - actor/pg_loss:0.022 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:1.301 - perf/mfu/actor:0.107 - perf/max_memory_allocated_gb:7.201 - perf/max_memory_reserved_gb:12.896 - perf/cpu_memory_used_gb:57.490 - actor/lr:0.000 - training/global_step:1.000 - training/epoch:0.000 - critic/score/mean:0.677 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.677 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.021 - critic/advantages/max:1.500 - critic/advantages/min:-1.500 - critic/returns/mean:-0.021 - critic/returns/max:1.500 - critic/returns/min:-1.500 - response_length/mean:376.086 - response_length/max:1024.000 - response_length/min:58.000 - response_length/clip_ratio:0.020 - prompt_length/mean:365.359 - prompt_length/max:459.000 - prompt_length/min:327.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:44.158 - timing_s/reshard:2.658 - timing_s/gen:47.161 - timing_s/reward:0.423 - timing_s/old_log_prob:15.347 - timing_s/ref:28.668 - timing_s/adv:0.039 - timing_s/update_actor:60.185 - timing_s/step:151.945 - timing_per_token_ms/gen:0.122 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.038 - timing_per_token_ms/update_actor:0.079 - perf/total_num_tokens:759240.000 - perf/time_per_step:151.945 - perf/throughput:624.599 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 1/435 [02:32<18:24:31, 152.70s/it] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7475 ``` Note the original dataset length is 7473 for `gsm8k_w_tool` ### High-Level Design > Demonstrate the high-level design if this PR is complex. n/a ### Specific Changes > List the specific changes. - Add an abstract datagen class that's used in ray_trainer.py to add data to the dataset - We refactor filtering out of `_read_files_and_tokenize` in RLHFDataset - We add `append_dataframe` to RLHFDataset - Add util for getting type from file. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Frederick Robinson <frederick.robinson@frrad.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>
### What does this PR do? Add interface to support dynamic data generation which will allow us to create new tasks between each step of training. To elaborate, this PR is refactoring the code and providing an interface to make it easier to implement other dynamic data generation algorithms. In particular, we want to have the model propose new tasks based on which tasks currently do or don't succeed. This has been shown to be useful for webtasks and reasoning: https://arxiv.org/pdf/2506.14205, https://openreview.net/pdf?id=oVKEAFjEqv, https://arxiv.org/abs/2502.06776, https://arxiv.org/pdf/2505.03335. Basic example that could be useful: Imagine wanting to generate variations on the hardest tasks for the current training loop. We implement this as a LLM API call as a custom data generator followed by a custom sampler that selects the desirable datapoints as they're generated. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: is:pr is:open data generation - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. `bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.sh` more details in Usage Example section below. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. 1. Change the yaml to enable ``` --- a/verl/trainer/config/ppo_trainer.yaml +++ b/verl/trainer/config/ppo_trainer.yaml @@ -93,11 +93,11 @@ data: # The path to the file containing your customized data generation class. # E.g. 'verl.utils.dataset.datagen' - path: null + path: 'verl.utils.dataset.datagen' # The class name of the data generation class within the specified file. # E.g. NoOpDataGen - name: null + name: 'NoOpDataGen' ``` The noop dataset just reappends the first datapoint at the end. You can see that this correctly happened by printing out the size of the dataset each epoch: ``` (TaskRunner pid=71298) step:0 - val-core/openai/gsm8k/reward/mean@1:0.668 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 0/435 [00:00<?, ?it/s] (WorkerDict pid=74307) /workplace/rl_workspace/src/AGIEmergeRL/vendor/verl_2/verl/verl/workers/rollout/sglang_rollout/utils.py:49: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.) [repeated 3x across cluster] (WorkerDict pid=74307) tensor_data = torch.ByteTensor(np.frombuffer(serialized_data, dtype=np.uint8)).to(device) [repeated 3x across cluster] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7474 (TaskRunner pid=71298) Filtering prompts longer than 1024 tokens: 100%|██████████| 1/1 [00:00<00:00, 165.34 examples/s] (TaskRunner pid=71298) 7474 (TaskRunner pid=71298) step:1 - global_seqlen/min:88786.000 - global_seqlen/max:101138.000 - global_seqlen/minmax_diff:12352.000 - global_seqlen/balanced_min:94905.000 - global_seqlen/balanced_max:94905.000 - global_seqlen/mean:94905.000 - actor/entropy:0.361 - actor/kl_loss:0.002 - actor/kl_coef:0.001 - actor/pg_loss:0.022 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:1.301 - perf/mfu/actor:0.107 - perf/max_memory_allocated_gb:7.201 - perf/max_memory_reserved_gb:12.896 - perf/cpu_memory_used_gb:57.490 - actor/lr:0.000 - training/global_step:1.000 - training/epoch:0.000 - critic/score/mean:0.677 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.677 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.021 - critic/advantages/max:1.500 - critic/advantages/min:-1.500 - critic/returns/mean:-0.021 - critic/returns/max:1.500 - critic/returns/min:-1.500 - response_length/mean:376.086 - response_length/max:1024.000 - response_length/min:58.000 - response_length/clip_ratio:0.020 - prompt_length/mean:365.359 - prompt_length/max:459.000 - prompt_length/min:327.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:44.158 - timing_s/reshard:2.658 - timing_s/gen:47.161 - timing_s/reward:0.423 - timing_s/old_log_prob:15.347 - timing_s/ref:28.668 - timing_s/adv:0.039 - timing_s/update_actor:60.185 - timing_s/step:151.945 - timing_per_token_ms/gen:0.122 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.038 - timing_per_token_ms/update_actor:0.079 - perf/total_num_tokens:759240.000 - perf/time_per_step:151.945 - perf/throughput:624.599 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 1/435 [02:32<18:24:31, 152.70s/it] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7475 ``` Note the original dataset length is 7473 for `gsm8k_w_tool` ### High-Level Design > Demonstrate the high-level design if this PR is complex. n/a ### Specific Changes > List the specific changes. - Add an abstract datagen class that's used in ray_trainer.py to add data to the dataset - We refactor filtering out of `_read_files_and_tokenize` in RLHFDataset - We add `append_dataframe` to RLHFDataset - Add util for getting type from file. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Frederick Robinson <frederick.robinson@frrad.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>
### What does this PR do? Add interface to support dynamic data generation which will allow us to create new tasks between each step of training. To elaborate, this PR is refactoring the code and providing an interface to make it easier to implement other dynamic data generation algorithms. In particular, we want to have the model propose new tasks based on which tasks currently do or don't succeed. This has been shown to be useful for webtasks and reasoning: https://arxiv.org/pdf/2506.14205, https://openreview.net/pdf?id=oVKEAFjEqv, https://arxiv.org/abs/2502.06776, https://arxiv.org/pdf/2505.03335. Basic example that could be useful: Imagine wanting to generate variations on the hardest tasks for the current training loop. We implement this as a LLM API call as a custom data generator followed by a custom sampler that selects the desirable datapoints as they're generated. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: is:pr is:open data generation - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. `bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.sh` more details in Usage Example section below. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. 1. Change the yaml to enable ``` --- a/verl/trainer/config/ppo_trainer.yaml +++ b/verl/trainer/config/ppo_trainer.yaml @@ -93,11 +93,11 @@ data: # The path to the file containing your customized data generation class. # E.g. 'verl.utils.dataset.datagen' - path: null + path: 'verl.utils.dataset.datagen' # The class name of the data generation class within the specified file. # E.g. NoOpDataGen - name: null + name: 'NoOpDataGen' ``` The noop dataset just reappends the first datapoint at the end. You can see that this correctly happened by printing out the size of the dataset each epoch: ``` (TaskRunner pid=71298) step:0 - val-core/openai/gsm8k/reward/mean@1:0.668 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 0/435 [00:00<?, ?it/s] (WorkerDict pid=74307) /workplace/rl_workspace/src/AGIEmergeRL/vendor/verl_2/verl/verl/workers/rollout/sglang_rollout/utils.py:49: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.) [repeated 3x across cluster] (WorkerDict pid=74307) tensor_data = torch.ByteTensor(np.frombuffer(serialized_data, dtype=np.uint8)).to(device) [repeated 3x across cluster] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7474 (TaskRunner pid=71298) Filtering prompts longer than 1024 tokens: 100%|██████████| 1/1 [00:00<00:00, 165.34 examples/s] (TaskRunner pid=71298) 7474 (TaskRunner pid=71298) step:1 - global_seqlen/min:88786.000 - global_seqlen/max:101138.000 - global_seqlen/minmax_diff:12352.000 - global_seqlen/balanced_min:94905.000 - global_seqlen/balanced_max:94905.000 - global_seqlen/mean:94905.000 - actor/entropy:0.361 - actor/kl_loss:0.002 - actor/kl_coef:0.001 - actor/pg_loss:0.022 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:1.301 - perf/mfu/actor:0.107 - perf/max_memory_allocated_gb:7.201 - perf/max_memory_reserved_gb:12.896 - perf/cpu_memory_used_gb:57.490 - actor/lr:0.000 - training/global_step:1.000 - training/epoch:0.000 - critic/score/mean:0.677 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.677 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.021 - critic/advantages/max:1.500 - critic/advantages/min:-1.500 - critic/returns/mean:-0.021 - critic/returns/max:1.500 - critic/returns/min:-1.500 - response_length/mean:376.086 - response_length/max:1024.000 - response_length/min:58.000 - response_length/clip_ratio:0.020 - prompt_length/mean:365.359 - prompt_length/max:459.000 - prompt_length/min:327.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:44.158 - timing_s/reshard:2.658 - timing_s/gen:47.161 - timing_s/reward:0.423 - timing_s/old_log_prob:15.347 - timing_s/ref:28.668 - timing_s/adv:0.039 - timing_s/update_actor:60.185 - timing_s/step:151.945 - timing_per_token_ms/gen:0.122 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.038 - timing_per_token_ms/update_actor:0.079 - perf/total_num_tokens:759240.000 - perf/time_per_step:151.945 - perf/throughput:624.599 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 1/435 [02:32<18:24:31, 152.70s/it] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7475 ``` Note the original dataset length is 7473 for `gsm8k_w_tool` ### High-Level Design > Demonstrate the high-level design if this PR is complex. n/a ### Specific Changes > List the specific changes. - Add an abstract datagen class that's used in ray_trainer.py to add data to the dataset - We refactor filtering out of `_read_files_and_tokenize` in RLHFDataset - We add `append_dataframe` to RLHFDataset - Add util for getting type from file. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Frederick Robinson <frederick.robinson@frrad.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>
### What does this PR do? Add interface to support dynamic data generation which will allow us to create new tasks between each step of training. To elaborate, this PR is refactoring the code and providing an interface to make it easier to implement other dynamic data generation algorithms. In particular, we want to have the model propose new tasks based on which tasks currently do or don't succeed. This has been shown to be useful for webtasks and reasoning: https://arxiv.org/pdf/2506.14205, https://openreview.net/pdf?id=oVKEAFjEqv, https://arxiv.org/abs/2502.06776, https://arxiv.org/pdf/2505.03335. Basic example that could be useful: Imagine wanting to generate variations on the hardest tasks for the current training loop. We implement this as a LLM API call as a custom data generator followed by a custom sampler that selects the desirable datapoints as they're generated. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: is:pr is:open data generation - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. `bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.sh` more details in Usage Example section below. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. 1. Change the yaml to enable ``` --- a/verl/trainer/config/ppo_trainer.yaml +++ b/verl/trainer/config/ppo_trainer.yaml @@ -93,11 +93,11 @@ data: # The path to the file containing your customized data generation class. # E.g. 'verl.utils.dataset.datagen' - path: null + path: 'verl.utils.dataset.datagen' # The class name of the data generation class within the specified file. # E.g. NoOpDataGen - name: null + name: 'NoOpDataGen' ``` The noop dataset just reappends the first datapoint at the end. You can see that this correctly happened by printing out the size of the dataset each epoch: ``` (TaskRunner pid=71298) step:0 - val-core/openai/gsm8k/reward/mean@1:0.668 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 0/435 [00:00<?, ?it/s] (WorkerDict pid=74307) /workplace/rl_workspace/src/AGIEmergeRL/vendor/verl_2/verl/verl/workers/rollout/sglang_rollout/utils.py:49: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.) [repeated 3x across cluster] (WorkerDict pid=74307) tensor_data = torch.ByteTensor(np.frombuffer(serialized_data, dtype=np.uint8)).to(device) [repeated 3x across cluster] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7474 (TaskRunner pid=71298) Filtering prompts longer than 1024 tokens: 100%|██████████| 1/1 [00:00<00:00, 165.34 examples/s] (TaskRunner pid=71298) 7474 (TaskRunner pid=71298) step:1 - global_seqlen/min:88786.000 - global_seqlen/max:101138.000 - global_seqlen/minmax_diff:12352.000 - global_seqlen/balanced_min:94905.000 - global_seqlen/balanced_max:94905.000 - global_seqlen/mean:94905.000 - actor/entropy:0.361 - actor/kl_loss:0.002 - actor/kl_coef:0.001 - actor/pg_loss:0.022 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:1.301 - perf/mfu/actor:0.107 - perf/max_memory_allocated_gb:7.201 - perf/max_memory_reserved_gb:12.896 - perf/cpu_memory_used_gb:57.490 - actor/lr:0.000 - training/global_step:1.000 - training/epoch:0.000 - critic/score/mean:0.677 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.677 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.021 - critic/advantages/max:1.500 - critic/advantages/min:-1.500 - critic/returns/mean:-0.021 - critic/returns/max:1.500 - critic/returns/min:-1.500 - response_length/mean:376.086 - response_length/max:1024.000 - response_length/min:58.000 - response_length/clip_ratio:0.020 - prompt_length/mean:365.359 - prompt_length/max:459.000 - prompt_length/min:327.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:44.158 - timing_s/reshard:2.658 - timing_s/gen:47.161 - timing_s/reward:0.423 - timing_s/old_log_prob:15.347 - timing_s/ref:28.668 - timing_s/adv:0.039 - timing_s/update_actor:60.185 - timing_s/step:151.945 - timing_per_token_ms/gen:0.122 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.038 - timing_per_token_ms/update_actor:0.079 - perf/total_num_tokens:759240.000 - perf/time_per_step:151.945 - perf/throughput:624.599 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 1/435 [02:32<18:24:31, 152.70s/it] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7475 ``` Note the original dataset length is 7473 for `gsm8k_w_tool` ### High-Level Design > Demonstrate the high-level design if this PR is complex. n/a ### Specific Changes > List the specific changes. - Add an abstract datagen class that's used in ray_trainer.py to add data to the dataset - We refactor filtering out of `_read_files_and_tokenize` in RLHFDataset - We add `append_dataframe` to RLHFDataset - Add util for getting type from file. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Frederick Robinson <frederick.robinson@frrad.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>
|
I am fairly confident this implementation is incorrect. Data loader workers are on separate processes, and thus have their own in memory copies of the dataframe. Modifications to the dataframe after each batch will not reach the dataloader workers. Additionally, the dataloader will not see the change to the size of the dataset, so it will not get the new samples. |
### What does this PR do? Add interface to support dynamic data generation which will allow us to create new tasks between each step of training. To elaborate, this PR is refactoring the code and providing an interface to make it easier to implement other dynamic data generation algorithms. In particular, we want to have the model propose new tasks based on which tasks currently do or don't succeed. This has been shown to be useful for webtasks and reasoning: https://arxiv.org/pdf/2506.14205, https://openreview.net/pdf?id=oVKEAFjEqv, https://arxiv.org/abs/2502.06776, https://arxiv.org/pdf/2505.03335. Basic example that could be useful: Imagine wanting to generate variations on the hardest tasks for the current training loop. We implement this as a LLM API call as a custom data generator followed by a custom sampler that selects the desirable datapoints as they're generated. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: is:pr is:open data generation - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. `bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.sh` more details in Usage Example section below. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. 1. Change the yaml to enable ``` --- a/verl/trainer/config/ppo_trainer.yaml +++ b/verl/trainer/config/ppo_trainer.yaml @@ -93,11 +93,11 @@ data: # The path to the file containing your customized data generation class. # E.g. 'verl.utils.dataset.datagen' - path: null + path: 'verl.utils.dataset.datagen' # The class name of the data generation class within the specified file. # E.g. NoOpDataGen - name: null + name: 'NoOpDataGen' ``` The noop dataset just reappends the first datapoint at the end. You can see that this correctly happened by printing out the size of the dataset each epoch: ``` (TaskRunner pid=71298) step:0 - val-core/openai/gsm8k/reward/mean@1:0.668 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 0/435 [00:00<?, ?it/s] (WorkerDict pid=74307) /workplace/rl_workspace/src/AGIEmergeRL/vendor/verl_2/verl/verl/workers/rollout/sglang_rollout/utils.py:49: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.) [repeated 3x across cluster] (WorkerDict pid=74307) tensor_data = torch.ByteTensor(np.frombuffer(serialized_data, dtype=np.uint8)).to(device) [repeated 3x across cluster] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7474 (TaskRunner pid=71298) Filtering prompts longer than 1024 tokens: 100%|██████████| 1/1 [00:00<00:00, 165.34 examples/s] (TaskRunner pid=71298) 7474 (TaskRunner pid=71298) step:1 - global_seqlen/min:88786.000 - global_seqlen/max:101138.000 - global_seqlen/minmax_diff:12352.000 - global_seqlen/balanced_min:94905.000 - global_seqlen/balanced_max:94905.000 - global_seqlen/mean:94905.000 - actor/entropy:0.361 - actor/kl_loss:0.002 - actor/kl_coef:0.001 - actor/pg_loss:0.022 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:1.301 - perf/mfu/actor:0.107 - perf/max_memory_allocated_gb:7.201 - perf/max_memory_reserved_gb:12.896 - perf/cpu_memory_used_gb:57.490 - actor/lr:0.000 - training/global_step:1.000 - training/epoch:0.000 - critic/score/mean:0.677 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.677 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.021 - critic/advantages/max:1.500 - critic/advantages/min:-1.500 - critic/returns/mean:-0.021 - critic/returns/max:1.500 - critic/returns/min:-1.500 - response_length/mean:376.086 - response_length/max:1024.000 - response_length/min:58.000 - response_length/clip_ratio:0.020 - prompt_length/mean:365.359 - prompt_length/max:459.000 - prompt_length/min:327.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:44.158 - timing_s/reshard:2.658 - timing_s/gen:47.161 - timing_s/reward:0.423 - timing_s/old_log_prob:15.347 - timing_s/ref:28.668 - timing_s/adv:0.039 - timing_s/update_actor:60.185 - timing_s/step:151.945 - timing_per_token_ms/gen:0.122 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.038 - timing_per_token_ms/update_actor:0.079 - perf/total_num_tokens:759240.000 - perf/time_per_step:151.945 - perf/throughput:624.599 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 1/435 [02:32<18:24:31, 152.70s/it] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7475 ``` Note the original dataset length is 7473 for `gsm8k_w_tool` ### High-Level Design > Demonstrate the high-level design if this PR is complex. n/a ### Specific Changes > List the specific changes. - Add an abstract datagen class that's used in ray_trainer.py to add data to the dataset - We refactor filtering out of `_read_files_and_tokenize` in RLHFDataset - We add `append_dataframe` to RLHFDataset - Add util for getting type from file. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Frederick Robinson <frederick.robinson@frrad.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>
### What does this PR do? Add interface to support dynamic data generation which will allow us to create new tasks between each step of training. To elaborate, this PR is refactoring the code and providing an interface to make it easier to implement other dynamic data generation algorithms. In particular, we want to have the model propose new tasks based on which tasks currently do or don't succeed. This has been shown to be useful for webtasks and reasoning: https://arxiv.org/pdf/2506.14205, https://openreview.net/pdf?id=oVKEAFjEqv, https://arxiv.org/abs/2502.06776, https://arxiv.org/pdf/2505.03335. Basic example that could be useful: Imagine wanting to generate variations on the hardest tasks for the current training loop. We implement this as a LLM API call as a custom data generator followed by a custom sampler that selects the desirable datapoints as they're generated. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: is:pr is:open data generation - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. `bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.sh` more details in Usage Example section below. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. 1. Change the yaml to enable ``` --- a/verl/trainer/config/ppo_trainer.yaml +++ b/verl/trainer/config/ppo_trainer.yaml @@ -93,11 +93,11 @@ data: # The path to the file containing your customized data generation class. # E.g. 'verl.utils.dataset.datagen' - path: null + path: 'verl.utils.dataset.datagen' # The class name of the data generation class within the specified file. # E.g. NoOpDataGen - name: null + name: 'NoOpDataGen' ``` The noop dataset just reappends the first datapoint at the end. You can see that this correctly happened by printing out the size of the dataset each epoch: ``` (TaskRunner pid=71298) step:0 - val-core/openai/gsm8k/reward/mean@1:0.668 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 0/435 [00:00<?, ?it/s] (WorkerDict pid=74307) /workplace/rl_workspace/src/AGIEmergeRL/vendor/verl_2/verl/verl/workers/rollout/sglang_rollout/utils.py:49: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.) [repeated 3x across cluster] (WorkerDict pid=74307) tensor_data = torch.ByteTensor(np.frombuffer(serialized_data, dtype=np.uint8)).to(device) [repeated 3x across cluster] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7474 (TaskRunner pid=71298) Filtering prompts longer than 1024 tokens: 100%|██████████| 1/1 [00:00<00:00, 165.34 examples/s] (TaskRunner pid=71298) 7474 (TaskRunner pid=71298) step:1 - global_seqlen/min:88786.000 - global_seqlen/max:101138.000 - global_seqlen/minmax_diff:12352.000 - global_seqlen/balanced_min:94905.000 - global_seqlen/balanced_max:94905.000 - global_seqlen/mean:94905.000 - actor/entropy:0.361 - actor/kl_loss:0.002 - actor/kl_coef:0.001 - actor/pg_loss:0.022 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:1.301 - perf/mfu/actor:0.107 - perf/max_memory_allocated_gb:7.201 - perf/max_memory_reserved_gb:12.896 - perf/cpu_memory_used_gb:57.490 - actor/lr:0.000 - training/global_step:1.000 - training/epoch:0.000 - critic/score/mean:0.677 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.677 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.021 - critic/advantages/max:1.500 - critic/advantages/min:-1.500 - critic/returns/mean:-0.021 - critic/returns/max:1.500 - critic/returns/min:-1.500 - response_length/mean:376.086 - response_length/max:1024.000 - response_length/min:58.000 - response_length/clip_ratio:0.020 - prompt_length/mean:365.359 - prompt_length/max:459.000 - prompt_length/min:327.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:44.158 - timing_s/reshard:2.658 - timing_s/gen:47.161 - timing_s/reward:0.423 - timing_s/old_log_prob:15.347 - timing_s/ref:28.668 - timing_s/adv:0.039 - timing_s/update_actor:60.185 - timing_s/step:151.945 - timing_per_token_ms/gen:0.122 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.038 - timing_per_token_ms/update_actor:0.079 - perf/total_num_tokens:759240.000 - perf/time_per_step:151.945 - perf/throughput:624.599 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 1/435 [02:32<18:24:31, 152.70s/it] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7475 ``` Note the original dataset length is 7473 for `gsm8k_w_tool` ### High-Level Design > Demonstrate the high-level design if this PR is complex. n/a ### Specific Changes > List the specific changes. - Add an abstract datagen class that's used in ray_trainer.py to add data to the dataset - We refactor filtering out of `_read_files_and_tokenize` in RLHFDataset - We add `append_dataframe` to RLHFDataset - Add util for getting type from file. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Frederick Robinson <frederick.robinson@frrad.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>
### What does this PR do? Add interface to support dynamic data generation which will allow us to create new tasks between each step of training. To elaborate, this PR is refactoring the code and providing an interface to make it easier to implement other dynamic data generation algorithms. In particular, we want to have the model propose new tasks based on which tasks currently do or don't succeed. This has been shown to be useful for webtasks and reasoning: https://arxiv.org/pdf/2506.14205, https://openreview.net/pdf?id=oVKEAFjEqv, https://arxiv.org/abs/2502.06776, https://arxiv.org/pdf/2505.03335. Basic example that could be useful: Imagine wanting to generate variations on the hardest tasks for the current training loop. We implement this as a LLM API call as a custom data generator followed by a custom sampler that selects the desirable datapoints as they're generated. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: is:pr is:open data generation - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. `bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.sh` more details in Usage Example section below. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. 1. Change the yaml to enable ``` --- a/verl/trainer/config/ppo_trainer.yaml +++ b/verl/trainer/config/ppo_trainer.yaml @@ -93,11 +93,11 @@ data: # The path to the file containing your customized data generation class. # E.g. 'verl.utils.dataset.datagen' - path: null + path: 'verl.utils.dataset.datagen' # The class name of the data generation class within the specified file. # E.g. NoOpDataGen - name: null + name: 'NoOpDataGen' ``` The noop dataset just reappends the first datapoint at the end. You can see that this correctly happened by printing out the size of the dataset each epoch: ``` (TaskRunner pid=71298) step:0 - val-core/openai/gsm8k/reward/mean@1:0.668 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 0/435 [00:00<?, ?it/s] (WorkerDict pid=74307) /workplace/rl_workspace/src/AGIEmergeRL/vendor/verl_2/verl/verl/workers/rollout/sglang_rollout/utils.py:49: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.) [repeated 3x across cluster] (WorkerDict pid=74307) tensor_data = torch.ByteTensor(np.frombuffer(serialized_data, dtype=np.uint8)).to(device) [repeated 3x across cluster] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7474 (TaskRunner pid=71298) Filtering prompts longer than 1024 tokens: 100%|██████████| 1/1 [00:00<00:00, 165.34 examples/s] (TaskRunner pid=71298) 7474 (TaskRunner pid=71298) step:1 - global_seqlen/min:88786.000 - global_seqlen/max:101138.000 - global_seqlen/minmax_diff:12352.000 - global_seqlen/balanced_min:94905.000 - global_seqlen/balanced_max:94905.000 - global_seqlen/mean:94905.000 - actor/entropy:0.361 - actor/kl_loss:0.002 - actor/kl_coef:0.001 - actor/pg_loss:0.022 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:1.301 - perf/mfu/actor:0.107 - perf/max_memory_allocated_gb:7.201 - perf/max_memory_reserved_gb:12.896 - perf/cpu_memory_used_gb:57.490 - actor/lr:0.000 - training/global_step:1.000 - training/epoch:0.000 - critic/score/mean:0.677 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.677 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.021 - critic/advantages/max:1.500 - critic/advantages/min:-1.500 - critic/returns/mean:-0.021 - critic/returns/max:1.500 - critic/returns/min:-1.500 - response_length/mean:376.086 - response_length/max:1024.000 - response_length/min:58.000 - response_length/clip_ratio:0.020 - prompt_length/mean:365.359 - prompt_length/max:459.000 - prompt_length/min:327.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:44.158 - timing_s/reshard:2.658 - timing_s/gen:47.161 - timing_s/reward:0.423 - timing_s/old_log_prob:15.347 - timing_s/ref:28.668 - timing_s/adv:0.039 - timing_s/update_actor:60.185 - timing_s/step:151.945 - timing_per_token_ms/gen:0.122 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.038 - timing_per_token_ms/update_actor:0.079 - perf/total_num_tokens:759240.000 - perf/time_per_step:151.945 - perf/throughput:624.599 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 1/435 [02:32<18:24:31, 152.70s/it] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7475 ``` Note the original dataset length is 7473 for `gsm8k_w_tool` ### High-Level Design > Demonstrate the high-level design if this PR is complex. n/a ### Specific Changes > List the specific changes. - Add an abstract datagen class that's used in ray_trainer.py to add data to the dataset - We refactor filtering out of `_read_files_and_tokenize` in RLHFDataset - We add `append_dataframe` to RLHFDataset - Add util for getting type from file. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Frederick Robinson <frederick.robinson@frrad.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>
### What does this PR do? Add interface to support dynamic data generation which will allow us to create new tasks between each step of training. To elaborate, this PR is refactoring the code and providing an interface to make it easier to implement other dynamic data generation algorithms. In particular, we want to have the model propose new tasks based on which tasks currently do or don't succeed. This has been shown to be useful for webtasks and reasoning: https://arxiv.org/pdf/2506.14205, https://openreview.net/pdf?id=oVKEAFjEqv, https://arxiv.org/abs/2502.06776, https://arxiv.org/pdf/2505.03335. Basic example that could be useful: Imagine wanting to generate variations on the hardest tasks for the current training loop. We implement this as a LLM API call as a custom data generator followed by a custom sampler that selects the desirable datapoints as they're generated. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: is:pr is:open data generation - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. `bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.sh` more details in Usage Example section below. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. 1. Change the yaml to enable ``` --- a/verl/trainer/config/ppo_trainer.yaml +++ b/verl/trainer/config/ppo_trainer.yaml @@ -93,11 +93,11 @@ data: # The path to the file containing your customized data generation class. # E.g. 'verl.utils.dataset.datagen' - path: null + path: 'verl.utils.dataset.datagen' # The class name of the data generation class within the specified file. # E.g. NoOpDataGen - name: null + name: 'NoOpDataGen' ``` The noop dataset just reappends the first datapoint at the end. You can see that this correctly happened by printing out the size of the dataset each epoch: ``` (TaskRunner pid=71298) step:0 - val-core/openai/gsm8k/reward/mean@1:0.668 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 0/435 [00:00<?, ?it/s] (WorkerDict pid=74307) /workplace/rl_workspace/src/AGIEmergeRL/vendor/verl_2/verl/verl/workers/rollout/sglang_rollout/utils.py:49: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.) [repeated 3x across cluster] (WorkerDict pid=74307) tensor_data = torch.ByteTensor(np.frombuffer(serialized_data, dtype=np.uint8)).to(device) [repeated 3x across cluster] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7474 (TaskRunner pid=71298) Filtering prompts longer than 1024 tokens: 100%|██████████| 1/1 [00:00<00:00, 165.34 examples/s] (TaskRunner pid=71298) 7474 (TaskRunner pid=71298) step:1 - global_seqlen/min:88786.000 - global_seqlen/max:101138.000 - global_seqlen/minmax_diff:12352.000 - global_seqlen/balanced_min:94905.000 - global_seqlen/balanced_max:94905.000 - global_seqlen/mean:94905.000 - actor/entropy:0.361 - actor/kl_loss:0.002 - actor/kl_coef:0.001 - actor/pg_loss:0.022 - actor/pg_clipfrac:0.000 - actor/ppo_kl:0.000 - actor/pg_clipfrac_lower:0.000 - actor/grad_norm:1.301 - perf/mfu/actor:0.107 - perf/max_memory_allocated_gb:7.201 - perf/max_memory_reserved_gb:12.896 - perf/cpu_memory_used_gb:57.490 - actor/lr:0.000 - training/global_step:1.000 - training/epoch:0.000 - critic/score/mean:0.677 - critic/score/max:1.000 - critic/score/min:0.000 - critic/rewards/mean:0.677 - critic/rewards/max:1.000 - critic/rewards/min:0.000 - critic/advantages/mean:-0.021 - critic/advantages/max:1.500 - critic/advantages/min:-1.500 - critic/returns/mean:-0.021 - critic/returns/max:1.500 - critic/returns/min:-1.500 - response_length/mean:376.086 - response_length/max:1024.000 - response_length/min:58.000 - response_length/clip_ratio:0.020 - prompt_length/mean:365.359 - prompt_length/max:459.000 - prompt_length/min:327.000 - prompt_length/clip_ratio:0.000 - timing_s/generate_sequences:44.158 - timing_s/reshard:2.658 - timing_s/gen:47.161 - timing_s/reward:0.423 - timing_s/old_log_prob:15.347 - timing_s/ref:28.668 - timing_s/adv:0.039 - timing_s/update_actor:60.185 - timing_s/step:151.945 - timing_per_token_ms/gen:0.122 - timing_per_token_ms/adv:0.000 - timing_per_token_ms/ref:0.038 - timing_per_token_ms/update_actor:0.079 - perf/total_num_tokens:759240.000 - perf/time_per_step:151.945 - perf/throughput:624.599 (TaskRunner pid=71298) NoOpDataGen: No operation performed on the dataset. Training Progress: 0%| | 1/435 [02:32<18:24:31, 152.70s/it] (TaskRunner pid=71298) filter dataset len: 1 (TaskRunner pid=71298) new dataset len: 7475 ``` Note the original dataset length is 7473 for `gsm8k_w_tool` ### High-Level Design > Demonstrate the high-level design if this PR is complex. n/a ### Specific Changes > List the specific changes. - Add an abstract datagen class that's used in ray_trainer.py to add data to the dataset - We refactor filtering out of `_read_files_and_tokenize` in RLHFDataset - We add `append_dataframe` to RLHFDataset - Add util for getting type from file. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). --------- Co-authored-by: Frederick Robinson <frederick.robinson@frrad.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>
What does this PR do?
Add interface to support dynamic data generation which will allow us to create new tasks between each step of training.
To elaborate, this PR is refactoring the code and providing an interface to make it easier to implement other dynamic data generation algorithms. In particular, we want to have the model propose new tasks based on which tasks currently do or don't succeed. This has been shown to be useful for webtasks and reasoning: https://arxiv.org/pdf/2506.14205, https://openreview.net/pdf?id=oVKEAFjEqv, https://arxiv.org/abs/2502.06776, https://arxiv.org/pdf/2505.03335.
Basic example that could be useful:
Imagine wanting to generate variations on the hardest tasks for the current training loop. We implement this as a LLM API call as a custom data generator followed by a custom sampler that selects the desirable datapoints as they're generated.
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
bash examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn.shmore details in Usage Example section below.
API and Usage Example
The noop dataset just reappends the first datapoint at the end. You can see that this correctly happened by printing out the size of the dataset each epoch:
Note the original dataset length is 7473 for
gsm8k_w_toolHigh-Level Design
n/a
Specific Changes
_read_files_and_tokenizein RLHFDatasetappend_dataframeto RLHFDatasetChecklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace.