Skip to content

[ckpt] fix: run converter_hf_to_mcore with --test will raise an AttributeError#2010

Merged
ETOgaosion merged 4 commits intoverl-project:mainfrom
lxg2015:main_converter
Jun 13, 2025
Merged

[ckpt] fix: run converter_hf_to_mcore with --test will raise an AttributeError#2010
ETOgaosion merged 4 commits intoverl-project:mainfrom
lxg2015:main_converter

Conversation

@lxg2015
Copy link
Contributor

@lxg2015 lxg2015 commented Jun 13, 2025

Checklist Before Starting

  • Searched for similar PR(s).
  • Checked PR Title format
    • In format of: [modules] type: Title
    • modules are in fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc
    • type is in feat, fix, refactor, chore
    • can involve multiple modules, seperated by , or space, like [megatron, fsdp, doc] feat: xxx

What does this PR do?

when I converter hf ckpt to mcore with --test, an AttributeError raised , this PR will fixed it

[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 305, in convert_hf_to_mcore
[rank0]:     test_conversion(megatron_model_provider, tfconfig, output_path, model)
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 78, in test_conversion
[rank0]:     assert dut_data.shape == ref_state_dict.shape, f"{name=} {dut_data.shape=} {ref_data.shape=}"
[rank0]: AttributeError: 'dict' object has no attribute 'shape'

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc.

High-Level Design

Demonstrate the high-level design if this PR is complex.

Specific Changes

List the specific changes.

API

Demonstrate how the API changes if any.

Usage Example

Provide usage example(s) for easier usage.

# Add code snippet or script demonstrating how to use this 

Checklist Before Submitting

  • Read the Contribute Guide.
  • Apply pre-commit checks.
  • Add [BREAKING] to the PR title description if it breaks any API.
  • Update the documentation about your changes in the docs.
  • New CI unit test(s) are added to cover the code path.
  • Rely on existing unit tests on CI that covers the code path.

@ETOgaosion
Copy link
Collaborator

ETOgaosion commented Jun 13, 2025

Thanks for contribution!

Actually it may be hard to test converter as there is no reference, but test whether runnable is OK, we can enable test here.

@ETOgaosion
Copy link
Collaborator

@lxg2015 Could you help fix the checkpoint tests?

@ETOgaosion ETOgaosion merged commit 6681e25 into verl-project:main Jun 13, 2025
34 of 37 checks passed
yellowbee686 pushed a commit to yellowbee686/verl that referenced this pull request Jun 18, 2025
…buteError (verl-project#2010)

### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> when I converter hf ckpt to mcore with --test, an AttributeError
raised , this PR will fixed it

```sh
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 305, in convert_hf_to_mcore
[rank0]:     test_conversion(megatron_model_provider, tfconfig, output_path, model)
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 78, in test_conversion
[rank0]:     assert dut_data.shape == ref_state_dict.shape, f"{name=} {dut_data.shape=} {ref_data.shape=}"
[rank0]: AttributeError: 'dict' object has no attribute 'shape'
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: lixiaoguang12 <lixiaoguang12@meituan.com>
Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
Tyizhanshen pushed a commit to HyperdriveHustle/verl that referenced this pull request Jul 1, 2025
…buteError (verl-project#2010)

### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> when I converter hf ckpt to mcore with --test, an AttributeError
raised , this PR will fixed it

```sh
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 305, in convert_hf_to_mcore
[rank0]:     test_conversion(megatron_model_provider, tfconfig, output_path, model)
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 78, in test_conversion
[rank0]:     assert dut_data.shape == ref_state_dict.shape, f"{name=} {dut_data.shape=} {ref_data.shape=}"
[rank0]: AttributeError: 'dict' object has no attribute 'shape'
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: lixiaoguang12 <lixiaoguang12@meituan.com>
Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025
…buteError (verl-project#2010)

### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> when I converter hf ckpt to mcore with --test, an AttributeError
raised , this PR will fixed it

```sh
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 305, in convert_hf_to_mcore
[rank0]:     test_conversion(megatron_model_provider, tfconfig, output_path, model)
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 78, in test_conversion
[rank0]:     assert dut_data.shape == ref_state_dict.shape, f"{name=} {dut_data.shape=} {ref_data.shape=}"
[rank0]: AttributeError: 'dict' object has no attribute 'shape'
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: lixiaoguang12 <lixiaoguang12@meituan.com>
Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
…buteError (verl-project#2010)

### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> when I converter hf ckpt to mcore with --test, an AttributeError
raised , this PR will fixed it

```sh
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 305, in convert_hf_to_mcore
[rank0]:     test_conversion(megatron_model_provider, tfconfig, output_path, model)
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 78, in test_conversion
[rank0]:     assert dut_data.shape == ref_state_dict.shape, f"{name=} {dut_data.shape=} {ref_data.shape=}"
[rank0]: AttributeError: 'dict' object has no attribute 'shape'
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: lixiaoguang12 <lixiaoguang12@meituan.com>
Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
…buteError (verl-project#2010)

### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> when I converter hf ckpt to mcore with --test, an AttributeError
raised , this PR will fixed it

```sh
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 305, in convert_hf_to_mcore
[rank0]:     test_conversion(megatron_model_provider, tfconfig, output_path, model)
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 78, in test_conversion
[rank0]:     assert dut_data.shape == ref_state_dict.shape, f"{name=} {dut_data.shape=} {ref_data.shape=}"
[rank0]: AttributeError: 'dict' object has no attribute 'shape'
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: lixiaoguang12 <lixiaoguang12@meituan.com>
Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
…buteError (verl-project#2010)

### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> when I converter hf ckpt to mcore with --test, an AttributeError
raised , this PR will fixed it

```sh
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 305, in convert_hf_to_mcore
[rank0]:     test_conversion(megatron_model_provider, tfconfig, output_path, model)
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 78, in test_conversion
[rank0]:     assert dut_data.shape == ref_state_dict.shape, f"{name=} {dut_data.shape=} {ref_data.shape=}"
[rank0]: AttributeError: 'dict' object has no attribute 'shape'
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: lixiaoguang12 <lixiaoguang12@meituan.com>
Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants