Add ray-based distributed reward manager #1206

rawsh · 2025-04-22T23:20:43Z

Add ray reward manager to support cross-node parallelization for verification

Tested with math-verify, dropped verification time to ~3s consistently for 128 x 16 = 2048 generations and ~30s for 256 x 64 = 16384 generations, with naive reward manager taking 200+ seconds

CLAassistant · 2025-04-22T23:23:05Z

All committers have signed the CLA.

rawsh · 2025-04-29T06:32:10Z

apologies for the update noise, cleaned it up a good bit

rbao2018 · 2025-05-09T08:03:31Z

Hello. I've achieved something similar to what you have. Well, I've found that when the Ray Cluster is started, only the master node uses all cpu cores for calculation, while other worker nodes don't. I'd like to ask, should we transform this function into the form of a working group?

rawsh · 2025-08-22T00:59:47Z

I think this approach is cleaner #1693

rawsh force-pushed the ray_verify branch 2 times, most recently from 0254cc8 to 9c68bc5 Compare April 23, 2025 08:35

rawsh changed the title ~~[draft] Ray-based parallel verification reward managers~~ add ray-based distributed reward manager Apr 23, 2025

rawsh changed the title ~~add ray-based distributed reward manager~~ Add ray-based distributed reward manager Apr 23, 2025

rawsh force-pushed the ray_verify branch 3 times, most recently from 227ac7a to 261b591 Compare April 24, 2025 05:15

wuxibin89 mentioned this pull request Apr 28, 2025

Error: Can't pickle local object 'get_custom_reward_fn.<locals>.wrapped_fn In Prime Reward Manager #1293

Closed

feat: add ray reward manager

7156b25

rawsh force-pushed the ray_verify branch from 261b591 to 7156b25 Compare April 29, 2025 06:31

ZihengJiang added the status: need review label Apr 29, 2025

eric-haibin-lin self-assigned this Jul 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ray-based distributed reward manager #1206

Add ray-based distributed reward manager #1206

Uh oh!

rawsh commented Apr 22, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Apr 22, 2025 •

edited

Loading

Uh oh!

rawsh commented Apr 29, 2025

Uh oh!

rbao2018 commented May 9, 2025

Uh oh!

rawsh commented Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add ray-based distributed reward manager #1206

Are you sure you want to change the base?

Add ray-based distributed reward manager #1206

Uh oh!

Conversation

rawsh commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rawsh commented Apr 29, 2025

Uh oh!

rbao2018 commented May 9, 2025

Uh oh!

rawsh commented Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

rawsh commented Apr 22, 2025 •

edited

Loading

CLAassistant commented Apr 22, 2025 •

edited

Loading