Skip to content

[megatron] feat: use mbridge as megatron adaptor#2064

Merged
ETOgaosion merged 14 commits intoverl-project:mainfrom
ISEEKYAN:use_mbridge
Jul 3, 2025
Merged

[megatron] feat: use mbridge as megatron adaptor#2064
ETOgaosion merged 14 commits intoverl-project:mainfrom
ISEEKYAN:use_mbridge

Conversation

@ISEEKYAN
Copy link
Collaborator

What does this PR do?

MBridge provides a seamless bridge between Hugging Face models and Megatron-Core's optimized implementation for efficient distributed training and inference. It also offers necessary tools and processes for integrating Reinforcement Learning (RL) with Megatron. see https://github.com/ISEEKYAN/mbridge
mbridge is developed and maintained by NVIDIA, providing functions for:

  • modeling HF models with megatron
  • loading/saving HF format weights with no memory overhead
  • online export parameter to rollout engine with per-tensor-generator
  • RL specific optimization and friendly APIs on Megatron side. Some early access features for megatron.

with mbridge, the direct improvement is:

  • a clean interface for megatron
  • no offline dist_ckpt conversion needed
  • no offline model merger needed

Test

tested with GSM8k qwen2-7B-instruct
image

High-Level Design

add an option actor_rollout_ref.actor.megatron.use_mbridge, default is False. Set it to true for enable. when enabled, the model_instantiate/model_init_load/checkpoint_save/checkpoint_load/per_tensor_generator will be taken over by mbridge

Specific Changes

List the specific changes.

API

Demonstrate how the API changes if any.

Usage Example

add this line to the script:

    actor_rollout_ref.actor.megatron.use_mbridge=True \

Checklist Before Submitting

  • Read the Contribute Guide.
  • Apply pre-commit checks.
  • Add [BREAKING] to the PR title description if it breaks any API.
  • Update the documentation about your changes in the docs.
  • New CI unit test(s) are added to cover the code path.
  • Rely on existing unit tests on CI that covers the code path.

seed: 42
override_transformer_config: {} # additional transformer config like: num_layers_in_first(/last)_pipeline_stage
profile: # profile the actor model in `update_policy`
use_mbridge: False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually use dist_checkpointing and mbridge should be an either-or relation? Maybe we shall use some naming like io_methods.loading_backend/saving_backend to choose between huggingface/dist_checkpointing/mbridge?

Also, we may need to consider how this combined with checkpoint configuration. Maybe directly merge these into checkpoint?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ccclyu @dataproblems , could you give some advice on the API design?

How use_dist_checkpointing and use_mbridge work to better integrate? My original thinking:

checkpoint:
    pre_load:    # first time load
        format: [hf, dist_ckpt].   # hf default use_mbridge
    load:
        format: [hf, dist_ckpt]
    save:
        format: [hf, dist_ckpt]

But maybe this will break some APIs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current way is ok in the config, since it's possible to have some relationship between load and save operations ( actor saves the model, rollout loads it - in the case where the two are not colocated ). However, we would need a validation when the config is read to make sure the load and save options are compatible with each other.

Implementation wise, I would add an abstraction that captures the checkpoint saving logic away from the checkpoint manager and the workers, that way the code base for checkpoint manager and workers relies on a stable interface and allows you to provide more options while modifying less code. Is that something that you were looking for, or am I missing the point here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Your latter part makes sense to me, it's a refactor point, here I hope to focus on API design.

So use_mbridge is a more functional option including model initialization, so it shall work as @ISEEKYAN 's implementation, so the question is whether use_dist_checkpointing should migrate into checkpoint config to work as first time loading option? Since API migration shall not involve this PR's changes, we will separate the feature development and the interface refactor, is it OK?

cc @ISEEKYAN @dataproblems @ccclyu

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me.
More detail about mbridge, it will include model init, parameter reshard, save/load HF format, forward with seq_pack/fused kernel (to be added), and other potential improvement on megatron side as a solution from NV to use megatron in RL frameworks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current config LGTM. Long-term wise, if we migrate to mbridge, will use_dist_checkpointing be deprecated and it only loads hf format?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I prefer use HF format in all lifetime of training.
But supporting dist_checkpointing or other formats like bytecheckpoint would make it more flexible if user is using a private pre-trained model. So the config might be like:

checkpoint:
    pre_load:    # first time load
        format: [hf, dist_ckpt, bytecheckpoint]   # hf default use_mbridge
    load_save:
        format: [hf, dist_ckpt, bytecheckpoint]

We would deprecate use_dist_checkpointing but keep it for a while and remind the user to use the new way. And we would update the example scripts to the new way.

@ETOgaosion ETOgaosion requested a review from ccclyu June 26, 2025 01:50
Copy link
Collaborator

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily for this PR but, is it possible to create some unit tests?

Comment on lines 207 to 219
if self.bridge is not None:
from verl.models.mcore.mbridge import freeze_moe_router

post_model_creation_callbacks = []
if override_model_config.get("moe_config", {}).get("freeze_moe_router", False):
post_model_creation_callbacks.append(freeze_moe_router)

# Step 3: initialize the megatron model
def make_model(wrap_with_ddp=False):
if self.bridge is not None:
return self.bridge.get_model(
post_model_creation_callbacks=post_model_creation_callbacks, wrap_with_ddp=wrap_with_ddp
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think we can move the post_model_creation_callbacks definition to make_model method?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just updated the implementation

@ISEEKYAN
Copy link
Collaborator Author

ISEEKYAN commented Jul 2, 2025

Not necessarily for this PR but, is it possible to create some unit tests?

sure, I would commit another PR for a small refactor to clean the megatron_worker.py and unified unit tests of megatron adaption

@ETOgaosion ETOgaosion merged commit 433544f into verl-project:main Jul 3, 2025
38 checks passed
yellowbee686 pushed a commit to yellowbee686/verl that referenced this pull request Jul 4, 2025
### What does this PR do?
MBridge provides a seamless bridge between Hugging Face models and
Megatron-Core's optimized implementation for efficient distributed
training and inference. It also offers necessary tools and processes for
integrating Reinforcement Learning (RL) with Megatron. see
https://github.com/ISEEKYAN/mbridge
mbridge is developed and maintained by NVIDIA, providing functions for:
- modeling HF models with megatron
- loading/saving HF format weights with no memory overhead
- online export parameter to rollout engine with per-tensor-generator
- RL specific optimization and friendly APIs on Megatron side. Some
early access features for megatron.

with mbridge, the direct improvement is:
- a clean interface for megatron
- no offline dist_ckpt conversion needed
- no offline model merger needed


### Test
tested with GSM8k qwen2-7B-instruct
<img width="486" alt="image"
src="https://github.com/user-attachments/assets/dd271e8a-9167-470f-8b0c-dde2bcfe1800"
/>


### High-Level Design
add an option `actor_rollout_ref.actor.megatron.use_mbridge`, default is
False. Set it to true for enable. when enabled, the
model_instantiate/model_init_load/checkpoint_save/checkpoint_load/per_tensor_generator
will be taken over by mbridge

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

add this line to the script:
```
    actor_rollout_ref.actor.megatron.use_mbridge=True \
```


### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
SuperCB pushed a commit to SuperCB/verl that referenced this pull request Jul 7, 2025
### What does this PR do?
MBridge provides a seamless bridge between Hugging Face models and
Megatron-Core's optimized implementation for efficient distributed
training and inference. It also offers necessary tools and processes for
integrating Reinforcement Learning (RL) with Megatron. see
https://github.com/ISEEKYAN/mbridge
mbridge is developed and maintained by NVIDIA, providing functions for:
- modeling HF models with megatron
- loading/saving HF format weights with no memory overhead
- online export parameter to rollout engine with per-tensor-generator
- RL specific optimization and friendly APIs on Megatron side. Some
early access features for megatron.

with mbridge, the direct improvement is:
- a clean interface for megatron
- no offline dist_ckpt conversion needed
- no offline model merger needed


### Test
tested with GSM8k qwen2-7B-instruct
<img width="486" alt="image"
src="https://github.com/user-attachments/assets/dd271e8a-9167-470f-8b0c-dde2bcfe1800"
/>


### High-Level Design
add an option `actor_rollout_ref.actor.megatron.use_mbridge`, default is
False. Set it to true for enable. when enabled, the
model_instantiate/model_init_load/checkpoint_save/checkpoint_load/per_tensor_generator
will be taken over by mbridge

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

add this line to the script:
```
    actor_rollout_ref.actor.megatron.use_mbridge=True \
```


### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
@Qin10
Copy link

Qin10 commented Jul 11, 2025

Hi! I'd like to ask, does the mbridge mode currently support the checkpoint retraining mechanism?

@ISEEKYAN
Copy link
Collaborator Author

Hi! I'd like to ask, does the mbridge mode currently support the checkpoint retraining mechanism?

Mbridge support the weights part load/save, but the optimizer states should be saved in distributed_checkpointing format.

@rj42
Copy link
Contributor

rj42 commented Jul 13, 2025

@ISEEKYAN, hello.
Could you tell me, please, what needs to be done for 'optimizer states should be saved in distributed_checkpointing format.'? Is this done at the config level? Is there a ready-made working example?
I'd appreciate it.

@ETOgaosion
Copy link
Collaborator

@rj42 Optimizer saving process using mbridge implementation still needs some fix to save optimizer states.

oseyosey pushed a commit to oseyosey/verl that referenced this pull request Jul 28, 2025
### What does this PR do?
MBridge provides a seamless bridge between Hugging Face models and
Megatron-Core's optimized implementation for efficient distributed
training and inference. It also offers necessary tools and processes for
integrating Reinforcement Learning (RL) with Megatron. see
https://github.com/ISEEKYAN/mbridge
mbridge is developed and maintained by NVIDIA, providing functions for:
- modeling HF models with megatron
- loading/saving HF format weights with no memory overhead
- online export parameter to rollout engine with per-tensor-generator
- RL specific optimization and friendly APIs on Megatron side. Some
early access features for megatron.

with mbridge, the direct improvement is:
- a clean interface for megatron
- no offline dist_ckpt conversion needed
- no offline model merger needed


### Test
tested with GSM8k qwen2-7B-instruct
<img width="486" alt="image"
src="https://github.com/user-attachments/assets/dd271e8a-9167-470f-8b0c-dde2bcfe1800"
/>


### High-Level Design
add an option `actor_rollout_ref.actor.megatron.use_mbridge`, default is
False. Set it to true for enable. when enabled, the
model_instantiate/model_init_load/checkpoint_save/checkpoint_load/per_tensor_generator
will be taken over by mbridge

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

add this line to the script:
```
    actor_rollout_ref.actor.megatron.use_mbridge=True \
```


### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
Juniper1021 pushed a commit to Juniper1021/verl that referenced this pull request Aug 7, 2025
### What does this PR do?
MBridge provides a seamless bridge between Hugging Face models and
Megatron-Core's optimized implementation for efficient distributed
training and inference. It also offers necessary tools and processes for
integrating Reinforcement Learning (RL) with Megatron. see
https://github.com/ISEEKYAN/mbridge
mbridge is developed and maintained by NVIDIA, providing functions for:
- modeling HF models with megatron
- loading/saving HF format weights with no memory overhead
- online export parameter to rollout engine with per-tensor-generator
- RL specific optimization and friendly APIs on Megatron side. Some
early access features for megatron.

with mbridge, the direct improvement is:
- a clean interface for megatron
- no offline dist_ckpt conversion needed
- no offline model merger needed


### Test
tested with GSM8k qwen2-7B-instruct
<img width="486" alt="image"
src="https://github.com/user-attachments/assets/dd271e8a-9167-470f-8b0c-dde2bcfe1800"
/>


### High-Level Design
add an option `actor_rollout_ref.actor.megatron.use_mbridge`, default is
False. Set it to true for enable. when enabled, the
model_instantiate/model_init_load/checkpoint_save/checkpoint_load/per_tensor_generator
will be taken over by mbridge

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

add this line to the script:
```
    actor_rollout_ref.actor.megatron.use_mbridge=True \
```


### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025
### What does this PR do?
MBridge provides a seamless bridge between Hugging Face models and
Megatron-Core's optimized implementation for efficient distributed
training and inference. It also offers necessary tools and processes for
integrating Reinforcement Learning (RL) with Megatron. see
https://github.com/ISEEKYAN/mbridge
mbridge is developed and maintained by NVIDIA, providing functions for:
- modeling HF models with megatron
- loading/saving HF format weights with no memory overhead
- online export parameter to rollout engine with per-tensor-generator
- RL specific optimization and friendly APIs on Megatron side. Some
early access features for megatron.

with mbridge, the direct improvement is:
- a clean interface for megatron
- no offline dist_ckpt conversion needed
- no offline model merger needed


### Test
tested with GSM8k qwen2-7B-instruct
<img width="486" alt="image"
src="https://github.com/user-attachments/assets/dd271e8a-9167-470f-8b0c-dde2bcfe1800"
/>


### High-Level Design
add an option `actor_rollout_ref.actor.megatron.use_mbridge`, default is
False. Set it to true for enable. when enabled, the
model_instantiate/model_init_load/checkpoint_save/checkpoint_load/per_tensor_generator
will be taken over by mbridge

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

add this line to the script:
```
    actor_rollout_ref.actor.megatron.use_mbridge=True \
```


### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
### What does this PR do?
MBridge provides a seamless bridge between Hugging Face models and
Megatron-Core's optimized implementation for efficient distributed
training and inference. It also offers necessary tools and processes for
integrating Reinforcement Learning (RL) with Megatron. see
https://github.com/ISEEKYAN/mbridge
mbridge is developed and maintained by NVIDIA, providing functions for:
- modeling HF models with megatron
- loading/saving HF format weights with no memory overhead
- online export parameter to rollout engine with per-tensor-generator
- RL specific optimization and friendly APIs on Megatron side. Some
early access features for megatron.

with mbridge, the direct improvement is:
- a clean interface for megatron
- no offline dist_ckpt conversion needed
- no offline model merger needed


### Test
tested with GSM8k qwen2-7B-instruct
<img width="486" alt="image"
src="https://github.com/user-attachments/assets/dd271e8a-9167-470f-8b0c-dde2bcfe1800"
/>


### High-Level Design
add an option `actor_rollout_ref.actor.megatron.use_mbridge`, default is
False. Set it to true for enable. when enabled, the
model_instantiate/model_init_load/checkpoint_save/checkpoint_load/per_tensor_generator
will be taken over by mbridge

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

add this line to the script:
```
    actor_rollout_ref.actor.megatron.use_mbridge=True \
```


### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
### What does this PR do?
MBridge provides a seamless bridge between Hugging Face models and
Megatron-Core's optimized implementation for efficient distributed
training and inference. It also offers necessary tools and processes for
integrating Reinforcement Learning (RL) with Megatron. see
https://github.com/ISEEKYAN/mbridge
mbridge is developed and maintained by NVIDIA, providing functions for:
- modeling HF models with megatron
- loading/saving HF format weights with no memory overhead
- online export parameter to rollout engine with per-tensor-generator
- RL specific optimization and friendly APIs on Megatron side. Some
early access features for megatron.

with mbridge, the direct improvement is:
- a clean interface for megatron
- no offline dist_ckpt conversion needed
- no offline model merger needed


### Test
tested with GSM8k qwen2-7B-instruct
<img width="486" alt="image"
src="https://github.com/user-attachments/assets/dd271e8a-9167-470f-8b0c-dde2bcfe1800"
/>


### High-Level Design
add an option `actor_rollout_ref.actor.megatron.use_mbridge`, default is
False. Set it to true for enable. when enabled, the
model_instantiate/model_init_load/checkpoint_save/checkpoint_load/per_tensor_generator
will be taken over by mbridge

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

add this line to the script:
```
    actor_rollout_ref.actor.megatron.use_mbridge=True \
```


### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
oseyosey pushed a commit to oseyosey/verl that referenced this pull request Jan 20, 2026
### What does this PR do?
MBridge provides a seamless bridge between Hugging Face models and
Megatron-Core's optimized implementation for efficient distributed
training and inference. It also offers necessary tools and processes for
integrating Reinforcement Learning (RL) with Megatron. see
https://github.com/ISEEKYAN/mbridge
mbridge is developed and maintained by NVIDIA, providing functions for:
- modeling HF models with megatron
- loading/saving HF format weights with no memory overhead
- online export parameter to rollout engine with per-tensor-generator
- RL specific optimization and friendly APIs on Megatron side. Some
early access features for megatron.

with mbridge, the direct improvement is:
- a clean interface for megatron
- no offline dist_ckpt conversion needed
- no offline model merger needed


### Test
tested with GSM8k qwen2-7B-instruct
<img width="486" alt="image"
src="https://github.com/user-attachments/assets/dd271e8a-9167-470f-8b0c-dde2bcfe1800"
/>


### High-Level Design
add an option `actor_rollout_ref.actor.megatron.use_mbridge`, default is
False. Set it to true for enable. when enabled, the
model_instantiate/model_init_load/checkpoint_save/checkpoint_load/per_tensor_generator
will be taken over by mbridge

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

add this line to the script:
```
    actor_rollout_ref.actor.megatron.use_mbridge=True \
```


### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
### What does this PR do?
MBridge provides a seamless bridge between Hugging Face models and
Megatron-Core's optimized implementation for efficient distributed
training and inference. It also offers necessary tools and processes for
integrating Reinforcement Learning (RL) with Megatron. see
https://github.com/ISEEKYAN/mbridge
mbridge is developed and maintained by NVIDIA, providing functions for:
- modeling HF models with megatron
- loading/saving HF format weights with no memory overhead
- online export parameter to rollout engine with per-tensor-generator
- RL specific optimization and friendly APIs on Megatron side. Some
early access features for megatron.

with mbridge, the direct improvement is:
- a clean interface for megatron
- no offline dist_ckpt conversion needed
- no offline model merger needed


### Test
tested with GSM8k qwen2-7B-instruct
<img width="486" alt="image"
src="https://github.com/user-attachments/assets/dd271e8a-9167-470f-8b0c-dde2bcfe1800"
/>


### High-Level Design
add an option `actor_rollout_ref.actor.megatron.use_mbridge`, default is
False. Set it to true for enable. when enabled, the
model_instantiate/model_init_load/checkpoint_save/checkpoint_load/per_tensor_generator
will be taken over by mbridge

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

add this line to the script:
```
    actor_rollout_ref.actor.megatron.use_mbridge=True \
```


### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants