[sglang] feat: Efficient and model-agnostic multi-turn messages tokenization and masking#1668
Conversation
c8a46b8 to
8d3e702
Compare
|
I asked serveral friends to validate this, stay tuned @jybsuper After validation, we should add documentation in verl @SwordFaith , then we can merge |
|
Great job, big thanks to Yanbin! Excited to hear back from you about the review comments ! |
d6abe9e to
65b9968
Compare
|
e48c64c to
b2f74a7
Compare
|
LGTM |
caad6ae to
cabfc71
Compare
…ization and masking (verl-project#1668)
|
Hi @vermouth1992 this PR broke async vLLM Server. Considering this was an update for SGLang, I am quite surprised. CC @wuxibin89 Here is the problem:
|
Hi @casper-hansen, thanks for reporting this issue! The That said, I'm quite surprised that vLLM has been repurposing the chat template Given the collaborative nature of VeRL and the number of moving parts, tracking cross-PR dependencies is indeed challenging, especially when PRs have long review cycles. This is exactly where proper tests become critical. I'd suggest updating the test here to include Let me know if you'd like help fixing this—happy to contribute a patch to resolve the issue properly. |
|
@jybsuper I would appreciate a patch for this so that the ChatCompletionScheduler can use similar config arguments. I do have a preference for the scheduler because of how easy it is for me to implement custom multi-turn training workflows. |
Sounds good. I will create a PR for fix soon. |
Fixed regression from: - verl-project#1668 - verl-project#1933 Added e2e test for both sglang and vllm async mode test
Fixed regression from: - verl-project#1668 - verl-project#1933 Added e2e test for both sglang and vllm async mode test
Fixed regression from: - verl-project#1668 - verl-project#1933 Added e2e test for both sglang and vllm async mode test
…ization and masking (verl-project#1668)
…ization and masking (verl-project#1668)
Fixed regression from: - verl-project#1668 - verl-project#1933 Added e2e test for both sglang and vllm async mode test
Fixed regression from: - verl-project/verl#1668 - verl-project/verl#1933 Added e2e test for both sglang and vllm async mode test
…ization and masking (verl-project#1668)
Fixed regression from: - verl-project#1668 - verl-project#1933 Added e2e test for both sglang and vllm async mode test
…ization and masking (verl-project#1668)
Fixed regression from: - verl-project#1668 - verl-project#1933 Added e2e test for both sglang and vllm async mode test
Checklist Before Starting
What does this PR do?
Implement efficient, model-agnostic multi-turn message tokenization and masking based solely on the chat template
Specific Changes
Challenges
Current rollout requires hand-crafting tokenization and loss-masking rules for every chat template, leading to verbose, error-prone
if/elselogic whenever a new template is added.On each generation turn, we re-tokenize the entire conversation history, wasting time on duplicated work.
Solution
apply_chat_template(tokenize=False)is far faster than withtokenize=True.Usage Example
Test
Correctness validation
Speed Benchmark
Simulated multi-turn rollout using snapshot data from a prior RL experiment:
Testing code:
Result
Conversations with more turns or tokens will get higher acceleration.
Additional Info.
Checklist Before Submitting
[BREAKING]to the PR title if it breaks any API.