-
Notifications
You must be signed in to change notification settings - Fork 3.2k
[doc,algo] feat: Rollout Correction - Fix Metrics, Add Documentation, and Add Batch Normalization #4070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ISEEKYAN
merged 20 commits into
verl-project:main
from
szrlee:yingru/rollout_correction_fix
Nov 12, 2025
Merged
[doc,algo] feat: Rollout Correction - Fix Metrics, Add Documentation, and Add Batch Normalization #4070
Changes from 1 commit
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
7c9e41d
fix(rollout_corr): compute metrics in actor for bypass mode and fix t…
szrlee 96ae2be
docs(rollout_corr): move to algo/ and add pure_rs preset
szrlee c0ea9bd
feat(rollout_corr): add batch normalization option for IS weights
szrlee 7de6c5f
docs(rollout_corr_math): use REINFORCE in aggregation loss examples f…
szrlee 2b34cfe
refactor(rollout_corr): simplify metrics computation by removing unus…
szrlee 0c42f85
docs(rollout_corr): add prominent cross-references between usage and …
szrlee fef8a48
docs(rollout_corr_math): add dedicated section for batch normalization
szrlee 08cc9c7
fix: docstring of compute_policy_loss_with_rollout_correction
tongyx361 437a4ab
feat: reuse need_recomputation instead of bypass_mode
tongyx361 5f9a53b
feat: improve comments
tongyx361 b2f6370
feat: improve comments
tongyx361 79cdbf2
feat: refactor bypass_recomputing_logprobs
tongyx361 62e3270
feat(rollout_corr): align batch normalization with IS aggregation level
szrlee b5c19ff
docs(rollout_corr): rename decoupled mode presets for clarity and upd…
szrlee 11f9aa0
fix(rollout_corr): correct metrics computation to run in decoupled mo…
szrlee 58565cb
docs(rollout_corr): rename presets for clarity and consistency
szrlee 8bb1a0e
refactor(rollout_corr): rename config vars for semantic clarity
szrlee 6002c00
refactor(rollout_corr): update implementation to use renamed config v…
szrlee 7f9ba9c
Merge branch 'main' into pr/szrlee/4070
tongyx361 56f69bf
fix: ppo_trainer config format
tongyx361 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next
Next commit
fix(rollout_corr): compute metrics in actor for bypass mode and fix t…
…rainer bugs Fix three critical issues in rollout correction metrics computation: 1. Missing rollout_corr_config parameter in ray_trainer.py line 1178 compute_rollout_correction_and_add_to_batch() call 2. Trainer computes meaningless metrics in bypass mode since old=rollout Results in KL≈0, weights≈1.0 that don't reflect actual drift 3. No metrics computed for bypass+non-pure mode during actor training Bypass+pure already computes metrics in pure loss function Solution: - Add compute_rollout_corr_metrics_from_logprobs() helper function to compute metrics using current policy vs rollout policy log probabilities - Always pass rollout_correction config to actor in bypass mode for metrics - Skip trainer metrics in bypass mode, compute meaningful metrics in actor - Actor computes per-microbatch metrics showing drift as training progresses Behavior by mode: - Bypass+non-pure: Actor computes metrics (π_current vs π_rollout) - Bypass+pure: Pure loss function computes metrics internally - Decoupled: Trainer computes metrics (π_old vs π_rollout) Files changed: - verl/trainer/ppo/rollout_corr_helper.py: Add metrics helper, always pass config - verl/trainer/ppo/ray_trainer.py: Fix missing param, skip bypass metrics - verl/workers/actor/dp_actor.py: Add rollout_log_probs selection, compute metrics - verl/workers/actor/megatron_actor.py: Add rollout_log_probs selection, compute metrics - verl/trainer/ppo/core_algos.py: Remove outdated documentation
- Loading branch information
commit 7c9e41daa02f14c96facf52651cb944d581c9219
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.