[recipe] fix: Qwen3-vl npu patch by leisuzz · Pull Request #4186 · verl-project/verl

leisuzz · 2025-11-18T11:50:23Z

What does this PR do?

Qwen3-vl with FSDP will need transformers > 4.57.1. Otherwise, bug will occur in transformers.
Add npu patch to Qwen3-vl to increase the performance

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

CLAassistant · 2025-11-18T11:50:30Z

All committers have signed the CLA.

gemini-code-assist

Code Review

This pull request introduces support for qwen3-vl models by adding NPU-specific patches to improve performance. The changes include new NPU-accelerated modules for Mixture-of-Experts (MoE) layers, optimized RMS normalization, and rotary position embeddings. The code correctly handles version differences in the transformers library. My review identified a potential critical issue in the custom autograd function for grouped matrix multiplication (GmmFunction_vl). The use of .T for tensor transposition may create a non-contiguous tensor view, which could lead to runtime errors or incorrect calculations with the custom NPU kernel. I've provided a suggestion to ensure the tensor is contiguous.

gemini-code-assist · 2025-11-18T11:52:15Z

verl/models/transformers/npu_patch.py

+            [grad_output], [weight], bias=None, group_list=group_list, split_item=2, group_type=0, group_list_type=1
+        )[0]
+        grad_weight = torch_npu.npu_grouped_matmul(
+            [input_tensor.T],


The use of .T on input_tensor creates a non-contiguous view of the tensor. While many PyTorch operations can handle non-contiguous tensors, custom NPU kernels like npu_grouped_matmul often require contiguous inputs for performance reasons or to work correctly. This could lead to runtime errors or silent incorrect computations on NPU hardware. To ensure correctness and robustness, it's safer to explicitly make the tensor contiguous after transposing. Using .transpose(0, 1) is also more explicit for 2D tensors.

Suggested change

[input_tensor.T],

[input_tensor.transpose(0, 1).contiguous()],

ji-huazhong

I’m on board with the PR itself, but our NPU-patch has added more and more Transformers compatibility changes over time—leading to noticeable maintenance difficulties. It’s time to upgrade the Transformers version on the NPU and clean up parts of the NPU-patch code accordingly.

FightingZhen · 2025-11-19T02:13:39Z

verl/models/transformers/npu_patch.py

    return final_hidden_states, router_logits


+class GmmFunction_vl(torch.autograd.Function):


How about renaming this class name to GmmFunctionVL?

FightingZhen · 2025-11-19T02:16:03Z

verl/models/transformers/npu_patch.py

+        return grad_input, grad_weight, None
+
+
+class Qwen3VLMoeTextExperts_npu(nn.Module):


Same here, it is not adviced to add _ into a class name, how about NPUQwen3VLMoeTextExperts ?

FightingZhen · 2025-11-19T02:33:59Z

I’m on board with the PR itself, but our NPU-patch has added more and more Transformers compatibility changes over time—leading to noticeable maintenance difficulties. It’s time to upgrade the Transformers version on the NPU and clean up parts of the NPU-patch code accordingly.

Transformers version in verl for Ascend NPU has updated to latest version (v4.57.1), including current e2e_ascend CI. Maybe we can clean some redundant patching now.

### What does this PR do? 1. Qwen3-vl with FSDP will need transformers > 4.57.1. Otherwise, bug will occur in transformers. 2. Add npu patch to Qwen3-vl to increase the performance ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

leisuzz requested review from FightingZhen, PeterSH6, ji-huazhong and vermouth1992 as code owners November 18, 2025 11:50

leisuzz changed the title ~~[receipe] supports for qwen3-vl~~ [recipe] fix: Qwen3-vl npu patch Nov 18, 2025

gemini-code-assist bot reviewed Nov 18, 2025

View reviewed changes

ji-huazhong approved these changes Nov 19, 2025

View reviewed changes

FightingZhen reviewed Nov 19, 2025

View reviewed changes

[receipe] supports for qwen3-vl

558bdac

leisuzz force-pushed the npu branch from 386822c to 558bdac Compare November 19, 2025 02:34

FightingZhen approved these changes Nov 19, 2025

View reviewed changes

FightingZhen merged commit 37ec013 into verl-project:main Nov 19, 2025
79 of 80 checks passed

quancs mentioned this pull request Nov 23, 2025

Failed to create actor. You set the async flag, but the actor does not have any coroutine functions. #4244

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[recipe] fix: Qwen3-vl npu patch#4186

[recipe] fix: Qwen3-vl npu patch#4186
FightingZhen merged 1 commit intoverl-project:mainfrom
leisuzz:npu

leisuzz commented Nov 18, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Nov 18, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 18, 2025

Uh oh!

ji-huazhong left a comment

Uh oh!

FightingZhen Nov 19, 2025

Uh oh!

leisuzz Nov 19, 2025

Uh oh!

FightingZhen Nov 19, 2025

Uh oh!

leisuzz Nov 19, 2025

Uh oh!

FightingZhen commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	[input_tensor.T],
	[input_tensor.transpose(0, 1).contiguous()],

		return final_hidden_states, router_logits


		class GmmFunction_vl(torch.autograd.Function):

		return grad_input, grad_weight, None


		class Qwen3VLMoeTextExperts_npu(nn.Module):

Conversation

leisuzz commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

CLAassistant commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

ji-huazhong left a comment

Choose a reason for hiding this comment

Uh oh!

FightingZhen Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

leisuzz Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

FightingZhen Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

leisuzz Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

FightingZhen commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

leisuzz commented Nov 18, 2025 •

edited

Loading

CLAassistant commented Nov 18, 2025 •

edited

Loading