Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
7c9e41d
fix(rollout_corr): compute metrics in actor for bypass mode and fix t…
szrlee Nov 10, 2025
96ae2be
docs(rollout_corr): move to algo/ and add pure_rs preset
szrlee Nov 10, 2025
c0ea9bd
feat(rollout_corr): add batch normalization option for IS weights
szrlee Nov 10, 2025
7de6c5f
docs(rollout_corr_math): use REINFORCE in aggregation loss examples f…
szrlee Nov 10, 2025
2b34cfe
refactor(rollout_corr): simplify metrics computation by removing unus…
szrlee Nov 10, 2025
0c42f85
docs(rollout_corr): add prominent cross-references between usage and …
szrlee Nov 10, 2025
fef8a48
docs(rollout_corr_math): add dedicated section for batch normalization
szrlee Nov 10, 2025
08cc9c7
fix: docstring of compute_policy_loss_with_rollout_correction
tongyx361 Nov 11, 2025
437a4ab
feat: reuse need_recomputation instead of bypass_mode
tongyx361 Nov 11, 2025
5f9a53b
feat: improve comments
tongyx361 Nov 11, 2025
b2f6370
feat: improve comments
tongyx361 Nov 11, 2025
79cdbf2
feat: refactor bypass_recomputing_logprobs
tongyx361 Nov 11, 2025
62e3270
feat(rollout_corr): align batch normalization with IS aggregation level
szrlee Nov 11, 2025
b5c19ff
docs(rollout_corr): rename decoupled mode presets for clarity and upd…
szrlee Nov 11, 2025
11f9aa0
fix(rollout_corr): correct metrics computation to run in decoupled mo…
szrlee Nov 11, 2025
58565cb
docs(rollout_corr): rename presets for clarity and consistency
szrlee Nov 11, 2025
8bb1a0e
refactor(rollout_corr): rename config vars for semantic clarity
szrlee Nov 11, 2025
6002c00
refactor(rollout_corr): update implementation to use renamed config v…
szrlee Nov 11, 2025
7f9ba9c
Merge branch 'main' into pr/szrlee/4070
tongyx361 Nov 11, 2025
56f69bf
fix: ppo_trainer config format
tongyx361 Nov 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/advance/fully_async.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,11 +166,11 @@ https://github.com/ArronHZG/verl-community/blob/recipe/async_policy/docs/fully_a

During the training process, we observed that metrics and response lengths may become unstable in the later
stages of training. To mitigate this issue, we can use
the [Rollout Correction](https://verl.readthedocs.io/en/latest/advance/rollout_corr.html)
the [Rollout Correction](https://verl.readthedocs.io/en/latest/algo/rollout_corr.html)
technique for importance sampling and rejection sampling. To utilize Rollout Correction, we need to compute log_prob using
the training engine, which requires enabling this switch.
Additionally, when compute_prox_log_prob and Rollout Correction are enabled under mode d
(async stream pipeline with partial rollout), our implementation approximates `Areal's Decoupled PPO`.
(async stream pipeline with partial rollout), our implementation follows `Decoupled PPO` that is described in [Mathmatics of Rollout Correction](https://verl.readthedocs.io/en/latest/algo/rollout_corr_math.html).

### Supported Modes

Expand Down
Loading
Loading