Skip to content

[sglang_rollout] feat: enable dynamic LoRA refresh for SGLang rollout#5220

Open
JohnConnor123 wants to merge 2 commits intoverl-project:mainfrom
JohnConnor123:feature/sglang_dynamic_lora
Open

[sglang_rollout] feat: enable dynamic LoRA refresh for SGLang rollout#5220
JohnConnor123 wants to merge 2 commits intoverl-project:mainfrom
JohnConnor123:feature/sglang_dynamic_lora

Conversation

@JohnConnor123
Copy link
Contributor

@JohnConnor123 JohnConnor123 commented Feb 6, 2026

Summary

This PR makes SGLang rollout compatible with LoRA training in VERL by refreshing LoRA adapters dynamically (unload + load) on each update_weights() call, without restarting the SGLang server.

In practice, this unblocks RL/RLHF training with Qwen-family models + LoRA + SGLang (when using an SGLang version that supports Qwen LoRA initialization; see “Notes / Requirements” below).

Problem

With SGLang as the rollout backend and LoRA enabled, the rollout server cannot be kept in sync with training reliably:

  • SGLang applies LoRA only when a request specifies lora_path. VERL’s SGLang generate path did not pass lora_path, so rollouts could silently run without the currently-trained adapter.
  • SGLang keeps LoRA weights in a separate adapter pool, so the generic tensor-based weight sync path (update_weights_from_tensor) cannot update LoRA weights by parameter names like *.lora_A.weight.
  • In some configurations, weight updates could also crash because the worker attempted to treat base weights as “not synced” and materialized tensors on CPU, while SGLang’s weight-update path expects GPU tensors.

What changed

  • SGLang HTTP client: add sync + async wrappers for SGLang’s /load_lora_adapter and /unload_lora_adapter endpoints.
  • Rollout request: when LoRA is enabled, include lora_path in each generate request so the currently-loaded adapter is actually applied.
  • Dynamic refresh on update_weights():
    • Split incoming weights into “base” vs “LoRA” items.
    • Save LoRA tensors to a PEFT-style adapter directory (written atomically, using safetensors).
    • Call unload_lora_adapter() then load_lora_adapter() for a fixed adapter name derived from replica/node rank.
  • Stability:
    • Treat base weights as preloaded by the SGLang server for LoRA training (so we don’t try to sync base weights through the rollout update path).
    • Treat HTTP 400 from unload_lora_adapter as a benign “not loaded” case during refresh.

Notes / Requirements

  • This implementation intentionally uses file-based adapter refresh (load_lora_adapter / unload_lora_adapter) and does not switch to the in-memory tensor endpoint.
  • This assumes the SGLang server loads base model weights from model_path at startup and (as is typical for LoRA training) base weights are not updated during training; only LoRA parameters are refreshed.
  • For Qwen-family models, LoRA init requires an SGLang version where get_hidden_dim() can handle common HF module names like q_proj in the fallback path. If you run into NotImplementedError: get_hidden_dim not implemented for q_proj, please use an SGLang build that includes the corresponding fix (or any newer release that contains it).

How to reproduce

The easiest repro in this repo is:

DATA_DIR="$HOME/data/gsm8k"
MODEL="Qwen/Qwen3-0.6B"
LORA_TARGETS='[q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj]'

RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES=1 \
PYTHONUNBUFFERED=1 \
python3 -m verl.trainer.main_ppo \
  algorithm.adv_estimator=grpo_vectorized \
  data.train_files="${DATA_DIR}/train.parquet" \
  data.val_files="${DATA_DIR}/test.parquet" \
  data.train_batch_size=16 \
  data.max_prompt_length=512 \
  data.max_response_length=256 \
  actor_rollout_ref.model.path="${MODEL}" \
  actor_rollout_ref.model.lora_rank=32 \
  actor_rollout_ref.model.lora_alpha=32 \
  actor_rollout_ref.model.target_modules=all-linear \
  actor_rollout_ref.actor.ppo_mini_batch_size=16 \
  actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
  actor_rollout_ref.rollout.name=sglang \
  actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
  actor_rollout_ref.rollout.gpu_memory_utilization=0.60 \
  actor_rollout_ref.rollout.load_format=safetensors \
  actor_rollout_ref.rollout.n=4 \
  actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
  +actor_rollout_ref.rollout.engine_kwargs.sglang.enable_lora=True \
  +actor_rollout_ref.rollout.engine_kwargs.sglang.max_lora_rank=32 \
  +actor_rollout_ref.rollout.engine_kwargs.sglang.lora_target_modules="${LORA_TARGETS}" \
  trainer.logger=console \
  trainer.val_before_train=False \
  trainer.n_gpus_per_node=1 \
  trainer.nnodes=1 \
  trainer.total_training_steps=1

This runs a short baseline vs patched comparison (20 steps) and validates that training no longer crashes when SGLang+LoRA is enabled and that rollouts apply the refreshed adapter.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces dynamic LoRA adapter refreshing for SGLang rollouts, a valuable feature for RLHF training loops. The implementation correctly utilizes SGLang's /load_lora_adapter and /unload_lora_adapter endpoints and handles the expected HTTP 400 error during the unload process. However, a security audit identified two high-severity vulnerabilities: missing authentication for SGLang HTTP requests and insecure file permissions for LoRA adapters stored in shared memory. These issues could lead to unauthorized access to sensitive model weights and potential control over the SGLang server. Additionally, the LoRA target modules are hardcoded, disregarding user configuration, and a broad except Exception: pass is used, which could conceal critical errors.

@JohnConnor123 JohnConnor123 force-pushed the feature/sglang_dynamic_lora branch from e6bc6f1 to a2753c3 Compare February 6, 2026 14:01
@JohnConnor123 JohnConnor123 force-pushed the feature/sglang_dynamic_lora branch from a2753c3 to c93593b Compare February 6, 2026 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant