Skip to content

[misc] fix: sft SFT E2E CI test failure due to megatron engine#3786

Merged
vermouth1992 merged 3 commits intoverl-project:mainfrom
houminz:houmin/fix-megatron-engine
Oct 16, 2025
Merged

[misc] fix: sft SFT E2E CI test failure due to megatron engine#3786
vermouth1992 merged 3 commits intoverl-project:mainfrom
houminz:houmin/fix-megatron-engine

Conversation

@houminz
Copy link
Contributor

@houminz houminz commented Oct 16, 2025

What does this PR do?

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

Fix the SFT E2E CI test failure caused by the Megatron engine integration (issue introduced in #3600).

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves a CI test failure within the SFT E2E tests concerning the Megatron engine. The primary fix, located in verl/models/mcore/util.py, correctly introduces a continue statement. This prevents a logical fall-through in the preprocess_packed_seqs_no_padding function when cp_size <= 1, which is a critical correction. The other modifications, such as removing a debug print statement and enhancing comments by replacing FIXMEs with detailed explanations, improve the code's clarity and maintainability. The changes are accurate and well-implemented, making the PR ready for merging.

@houminz houminz force-pushed the houmin/fix-megatron-engine branch from 8808518 to 53d0f0f Compare October 16, 2025 13:06
@vermouth1992 vermouth1992 merged commit 65b8bf1 into verl-project:main Oct 16, 2025
81 of 88 checks passed
chenhaiq pushed a commit to The-Hierophant/verl-1 that referenced this pull request Nov 18, 2025
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants