Skip to content

Conversation

@rawsh
Copy link

@rawsh rawsh commented Apr 22, 2025

Add ray reward manager to support cross-node parallelization for verification

Tested with math-verify, dropped verification time to ~3s consistently for 128 x 16 = 2048 generations and ~30s for 256 x 64 = 16384 generations, with naive reward manager taking 200+ seconds

@CLAassistant
Copy link

CLAassistant commented Apr 22, 2025

CLA assistant check
All committers have signed the CLA.

@rawsh rawsh force-pushed the ray_verify branch 2 times, most recently from 0254cc8 to 9c68bc5 Compare April 23, 2025 08:35
@rawsh rawsh changed the title [draft] Ray-based parallel verification reward managers add ray-based distributed reward manager Apr 23, 2025
@rawsh rawsh changed the title add ray-based distributed reward manager Add ray-based distributed reward manager Apr 23, 2025
@rawsh rawsh force-pushed the ray_verify branch 3 times, most recently from 227ac7a to 261b591 Compare April 24, 2025 05:15
@rawsh
Copy link
Author

rawsh commented Apr 29, 2025

apologies for the update noise, cleaned it up a good bit

@rbao2018
Copy link

rbao2018 commented May 9, 2025

Hello. I've achieved something similar to what you have. Well, I've found that when the Ray Cluster is started, only the master node uses all cpu cores for calculation, while other worker nodes don't. I'd like to ask, should we transform this function into the form of a working group?

@eric-haibin-lin eric-haibin-lin self-assigned this Jul 4, 2025
@rawsh
Copy link
Author

rawsh commented Aug 22, 2025

I think this approach is cleaner #1693

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants