-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[ray, worker] feat: DAPO x LLM as a Judge (GenRM) #1953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3fd62f9 to
d51bdd2
Compare
|
could you please help review this GenRM PR @mark-myzhao? thanks! |
|
我实现了一个类似思路的,也是复用ActorRollout这个Worker类来完成。比较不理解的是,这里为什么需要额外一个pool来分配资源,把GRM放在单独的节点上是不是有点浪费卡? |
I also tried to make grm and actor run in the same resource pool, but found it got cuda oom during training. In addition, vLLM rollout only support 1 rolloutworker to enable sleep_mode in each process. Thats why I use individual resource pool to setup grm @Dada-Cloudzxy |
I found ways to run extra vLLM engine in the same process, see #1193 (comment) |
Its so cool!!!! |
|
我原本想的是将其封装在mcp里,可以自由定制不同任务的api |
|
@llm-player-01 大佬好,请问可以加您微信请教下吗 |
欢迎交流哇 可以私聊 留个邮箱吧 |
|
这个做法,是否实现了rollout和GenRM的overlap?不然直接在reward里call inference api更简单。毕竟当GenRM的size比较大时,显存不能同时放下GenRM和其他的参数。 |
是和call api效果一样,这个方案适合不想搞常驻推理服务,伴生训练随用随起的场景 |
|
|
@wus2000 没有,actor 和 genrm 的参数都一样吗 |
Checklist Before Starting
fsdp, megatron, sglang, vllm, rollout, trainer, tests, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckptfeat, fix, doc, refactor, chore,or space, like[megatron, fsdp] feat: xxxWhat does this PR do?
Specific Changes
Usage Example
Test
Additional Info.
Checklist Before Submitting
[BREAKING]to the PR title if it breaks any API.