fix: correctly handle the inconsistency in reward_extra_info and reward_tensor #1044

0x404 · 2025-04-12T01:57:39Z

If we are using multiple datasources for training/test, some reward function may return float and some may return dict (including extra_info) depending on the datasource.

In this case, the reward_tensor and reward_extra_info returned by RewardManager will have different shape, this causes the error in #1031.

this PR fix the inconsistency by correctly padding a place holder 'unkown' for those reward function only return float, this place holder will not be processed in and has no effect on the correctness:

verl/verl/trainer/ppo/metric_utils.py

Lines 227 to 233 in dc1714a

    
           # Calculate metrics for each group 
        
           data_src2prompt2var2metric = defaultdict(lambda: defaultdict(lambda: defaultdict(dict))) 
        
           for data_source, prompt2var2vals in data_src2prompt2var2vals.items(): 
        
               for prompt, var2vals in prompt2var2vals.items(): 
        
                   for var_name, var_vals in var2vals.items(): 
        
                       if isinstance(var_vals[0], str): 
        
                           continue

0x404 · 2025-04-17T04:43:16Z

Hi, @vermouth1992, Could you please help review this PR? Much appreciated!

fix: correctly handle the inconsistency in reward_extra_info

99bac37

0x404 force-pushed the fix_info branch from 9d1c602 to 99bac37 Compare April 12, 2025 02:01

fix a typo

6f5796a

Merge branch 'main' into fix_info

0ab744c

ZihengJiang added the status: need review label Apr 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: correctly handle the inconsistency in reward_extra_info and reward_tensor #1044

fix: correctly handle the inconsistency in reward_extra_info and reward_tensor #1044

0x404 commented Apr 12, 2025

Uh oh!

0x404 commented Apr 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	# Calculate metrics for each group
	data_src2prompt2var2metric = defaultdict(lambda: defaultdict(lambda: defaultdict(dict)))
	for data_source, prompt2var2vals in data_src2prompt2var2vals.items():
	for prompt, var2vals in prompt2var2vals.items():
	for var_name, var_vals in var2vals.items():
	if isinstance(var_vals[0], str):
	continue

fix: correctly handle the inconsistency in reward_extra_info and reward_tensor #1044

Are you sure you want to change the base?

fix: correctly handle the inconsistency in reward_extra_info and reward_tensor #1044

Conversation

0x404 commented Apr 12, 2025

Uh oh!

0x404 commented Apr 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants