Skip to content

Conversation

@0x404
Copy link
Collaborator

@0x404 0x404 commented Apr 12, 2025

If we are using multiple datasources for training/test, some reward function may return float and some may return dict (including extra_info) depending on the datasource.

In this case, the reward_tensor and reward_extra_info returned by RewardManager will have different shape, this causes the error in #1031.

this PR fix the inconsistency by correctly padding a place holder 'unkown' for those reward function only return float, this place holder will not be processed in and has no effect on the correctness:

# Calculate metrics for each group
data_src2prompt2var2metric = defaultdict(lambda: defaultdict(lambda: defaultdict(dict)))
for data_source, prompt2var2vals in data_src2prompt2var2vals.items():
for prompt, var2vals in prompt2var2vals.items():
for var_name, var_vals in var2vals.items():
if isinstance(var_vals[0], str):
continue

@0x404
Copy link
Collaborator Author

0x404 commented Apr 17, 2025

Hi, @vermouth1992, Could you please help review this PR? Much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants