You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[megatron] feat: use mbridge as megatron adaptor (#2064)
### What does this PR do?
MBridge provides a seamless bridge between Hugging Face models and
Megatron-Core's optimized implementation for efficient distributed
training and inference. It also offers necessary tools and processes for
integrating Reinforcement Learning (RL) with Megatron. see
https://github.com/ISEEKYAN/mbridge
mbridge is developed and maintained by NVIDIA, providing functions for:
- modeling HF models with megatron
- loading/saving HF format weights with no memory overhead
- online export parameter to rollout engine with per-tensor-generator
- RL specific optimization and friendly APIs on Megatron side. Some
early access features for megatron.
with mbridge, the direct improvement is:
- a clean interface for megatron
- no offline dist_ckpt conversion needed
- no offline model merger needed
### Test
tested with GSM8k qwen2-7B-instruct
<img width="486" alt="image"
src="https://github.com/user-attachments/assets/dd271e8a-9167-470f-8b0c-dde2bcfe1800"
/>
### High-Level Design
add an option `actor_rollout_ref.actor.megatron.use_mbridge`, default is
False. Set it to true for enable. when enabled, the
model_instantiate/model_init_load/checkpoint_save/checkpoint_load/per_tensor_generator
will be taken over by mbridge
### Specific Changes
> List the specific changes.
### API
> Demonstrate how the API changes if any.
### Usage Example
add this line to the script:
```
actor_rollout_ref.actor.megatron.use_mbridge=True \
```
### Checklist Before Submitting
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
0 commit comments