Skip to content
Merged
Changes from 1 commit
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
12d7434
[trainer] feat: Enhance PrimeRewardManager with configurable num_proc…
JoyboyBrian Nov 10, 2025
608adae
[reward] feat: Add PrimeRewardLoopManager to reward loop module
JoyboyBrian Nov 12, 2025
bd74449
[trainer] fix: Ensure num_processes is set for PrimeRewardManager
JoyboyBrian Nov 12, 2025
744310e
[reward] fix: Remove logger level setting from prime.py
JoyboyBrian Nov 12, 2025
7fffa8f
[reward] fix: Introduce class-level flag for PrimeRewardLoopManager i…
JoyboyBrian Nov 12, 2025
1c94047
[reward] feat: Implement multi-layer rate limiting in PrimeRewardLoop…
JoyboyBrian Nov 12, 2025
7a4013a
revert
JoyboyBrian Nov 12, 2025
de39254
revert
JoyboyBrian Nov 12, 2025
9e20630
typo
JoyboyBrian Nov 12, 2025
f6fd9f6
rename
JoyboyBrian Nov 12, 2025
fa72243
[reward] docs: Enhance AsyncTokenBucket and RateLimitedRewardLoopMana…
JoyboyBrian Nov 12, 2025
2176f44
[reward] fix: Update test cases and rate limiting logic
JoyboyBrian Nov 12, 2025
05be129
[reward] fix: Refactor AsyncTokenBucket logic for handling token requ…
JoyboyBrian Nov 12, 2025
1f9bfca
[reward] refactor: Update registration of RateLimitedRewardLoopManager
JoyboyBrian Nov 12, 2025
355e42b
[reward] refactor: Enhance reward manager instantiation logic
JoyboyBrian Nov 12, 2025
61b5d11
[reward] refactor: Improve event loop management in RewardLoopManager…
JoyboyBrian Nov 12, 2025
392d085
fix: Add __call__ method to RateLimitedRewardLoopManager
JoyboyBrian Nov 13, 2025
96b918e
fix: Resolve CI failures for rate limiting PR
JoyboyBrian Nov 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[reward] fix: Introduce class-level flag for PrimeRewardLoopManager i…
…nitialization

Added a new class-level flag `_prime_class_initialized` to manage the initialization state of the `PrimeRewardLoopManager`. This change ensures that the class can properly initialize its semaphore without conflicts with the base class's initialization logic.
  • Loading branch information
JoyboyBrian committed Nov 12, 2025
commit 7fffa8f6d406319501512e1caf940c944e6fcee8
9 changes: 7 additions & 2 deletions verl/experimental/reward/reward_loop/prime.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ class PrimeRewardLoopManager(RewardLoopManagerBase):
# Class-level semaphore shared across all instances for global rate limiting
_semaphore = None
_max_concurrent = None
_prime_class_initialized = False

@classmethod
def init_class(cls, config: DictConfig, tokenizer: AutoTokenizer):
Expand All @@ -67,7 +68,11 @@ def init_class(cls, config: DictConfig, tokenizer: AutoTokenizer):
This creates a class-level semaphore that is shared by all PrimeRewardLoopManager
instances, ensuring true global rate limiting across all agent loop workers.
"""
if cls._class_initialized:
# Call parent init_class first
super().init_class(config, tokenizer)

# Use our own class-level flag to avoid conflicts with base class
if cls._prime_class_initialized:
return

cls._max_concurrent = config.reward_model.get("max_concurrent", 1)
Expand All @@ -78,7 +83,7 @@ def init_class(cls, config: DictConfig, tokenizer: AutoTokenizer):
f"This semaphore is shared across all agent loop workers for global rate limiting."
)

cls._class_initialized = True
cls._prime_class_initialized = True

def __init__(self, config, tokenizer, compute_score=None, reward_router_address=None, reward_model_tokenizer=None):
super().__init__(config, tokenizer)
Expand Down