[sglang_rollout] feat: enable dynamic LoRA refresh for SGLang rollout by JohnConnor123 · Pull Request #5220 · verl-project/verl

JohnConnor123 · 2026-02-06T13:02:28Z

Summary

This PR makes SGLang rollout compatible with LoRA training in VERL by refreshing LoRA adapters dynamically (unload + load) on each update_weights() call, without restarting the SGLang server.

In practice, this unblocks RL/RLHF training with Qwen-family models + LoRA + SGLang (when using an SGLang version that supports Qwen LoRA initialization; see “Notes / Requirements” below).

Problem

With SGLang as the rollout backend and LoRA enabled, the rollout server cannot be kept in sync with training reliably:

SGLang applies LoRA only when a request specifies lora_path. VERL’s SGLang generate path did not pass lora_path, so rollouts could silently run without the currently-trained adapter.
SGLang keeps LoRA weights in a separate adapter pool, so the generic tensor-based weight sync path (update_weights_from_tensor) cannot update LoRA weights by parameter names like *.lora_A.weight.
In some configurations, weight updates could also crash because the worker attempted to treat base weights as “not synced” and materialized tensors on CPU, while SGLang’s weight-update path expects GPU tensors.

What changed

SGLang HTTP client: add sync + async wrappers for SGLang’s /load_lora_adapter and /unload_lora_adapter endpoints.
Rollout request: when LoRA is enabled, include lora_path in each generate request so the currently-loaded adapter is actually applied.
Dynamic refresh on update_weights():
- Split incoming weights into “base” vs “LoRA” items.
- Save LoRA tensors to a PEFT-style adapter directory (written atomically, using safetensors).
- Call unload_lora_adapter() then load_lora_adapter() for a fixed adapter name derived from replica/node rank.
Stability:
- Treat base weights as preloaded by the SGLang server for LoRA training (so we don’t try to sync base weights through the rollout update path).
- Treat HTTP 400 from unload_lora_adapter as a benign “not loaded” case during refresh.

Notes / Requirements

This implementation intentionally uses file-based adapter refresh (load_lora_adapter / unload_lora_adapter) and does not switch to the in-memory tensor endpoint.
This assumes the SGLang server loads base model weights from model_path at startup and (as is typical for LoRA training) base weights are not updated during training; only LoRA parameters are refreshed.
For Qwen-family models, LoRA init requires an SGLang version where get_hidden_dim() can handle common HF module names like q_proj in the fallback path. If you run into NotImplementedError: get_hidden_dim not implemented for q_proj, please use an SGLang build that includes the corresponding fix (or any newer release that contains it).

How to reproduce

The easiest repro in this repo is:

DATA_DIR="$HOME/data/gsm8k"
MODEL="Qwen/Qwen3-0.6B"
LORA_TARGETS='[q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj]'

RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES=1 \
PYTHONUNBUFFERED=1 \
python3 -m verl.trainer.main_ppo \
  algorithm.adv_estimator=grpo_vectorized \
  data.train_files="${DATA_DIR}/train.parquet" \
  data.val_files="${DATA_DIR}/test.parquet" \
  data.train_batch_size=16 \
  data.max_prompt_length=512 \
  data.max_response_length=256 \
  actor_rollout_ref.model.path="${MODEL}" \
  actor_rollout_ref.model.lora_rank=32 \
  actor_rollout_ref.model.lora_alpha=32 \
  actor_rollout_ref.model.target_modules=all-linear \
  actor_rollout_ref.actor.ppo_mini_batch_size=16 \
  actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
  actor_rollout_ref.rollout.name=sglang \
  actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
  actor_rollout_ref.rollout.gpu_memory_utilization=0.60 \
  actor_rollout_ref.rollout.load_format=safetensors \
  actor_rollout_ref.rollout.n=4 \
  actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
  +actor_rollout_ref.rollout.engine_kwargs.sglang.enable_lora=True \
  +actor_rollout_ref.rollout.engine_kwargs.sglang.max_lora_rank=32 \
  +actor_rollout_ref.rollout.engine_kwargs.sglang.lora_target_modules="${LORA_TARGETS}" \
  trainer.logger=console \
  trainer.val_before_train=False \
  trainer.n_gpus_per_node=1 \
  trainer.nnodes=1 \
  trainer.total_training_steps=1

This runs a short baseline vs patched comparison (20 steps) and validates that training no longer crashes when SGLang+LoRA is enabled and that rollouts apply the refreshed adapter.

gemini-code-assist

Code Review

This pull request introduces dynamic LoRA adapter refreshing for SGLang rollouts, a valuable feature for RLHF training loops. The implementation correctly utilizes SGLang's /load_lora_adapter and /unload_lora_adapter endpoints and handles the expected HTTP 400 error during the unload process. However, a security audit identified two high-severity vulnerabilities: missing authentication for SGLang HTTP requests and insecure file permissions for LoRA adapters stored in shared memory. These issues could lead to unauthorized access to sensitive model weights and potential control over the SGLang server. Additionally, the LoRA target modules are hardcoded, disregarding user configuration, and a broad except Exception: pass is used, which could conceal critical errors.

verl/workers/rollout/sglang_rollout/http_server_engine.py

verl/workers/rollout/sglang_rollout/sglang_rollout.py

JohnConnor123 requested a review from chenhaiq as a code owner February 6, 2026 13:02

gemini-code-assist bot reviewed Feb 6, 2026

View reviewed changes

JohnConnor123 force-pushed the feature/sglang_dynamic_lora branch from e6bc6f1 to a2753c3 Compare February 6, 2026 14:01

feat: enable dynamic LoRA refresh for SGLang rollout

c93593b

JohnConnor123 force-pushed the feature/sglang_dynamic_lora branch from a2753c3 to c93593b Compare February 6, 2026 14:04

applied the suggested corrections

4e972aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sglang_rollout] feat: enable dynamic LoRA refresh for SGLang rollout#5220

[sglang_rollout] feat: enable dynamic LoRA refresh for SGLang rollout#5220
JohnConnor123 wants to merge 2 commits intoverl-project:mainfrom
JohnConnor123:feature/sglang_dynamic_lora

JohnConnor123 commented Feb 6, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JohnConnor123 commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

What changed

Notes / Requirements

How to reproduce

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JohnConnor123 commented Feb 6, 2026 •

edited

Loading