[sglang_rollout] feat: enable dynamic LoRA refresh for SGLang rollout#5220
[sglang_rollout] feat: enable dynamic LoRA refresh for SGLang rollout#5220JohnConnor123 wants to merge 2 commits intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces dynamic LoRA adapter refreshing for SGLang rollouts, a valuable feature for RLHF training loops. The implementation correctly utilizes SGLang's /load_lora_adapter and /unload_lora_adapter endpoints and handles the expected HTTP 400 error during the unload process. However, a security audit identified two high-severity vulnerabilities: missing authentication for SGLang HTTP requests and insecure file permissions for LoRA adapters stored in shared memory. These issues could lead to unauthorized access to sensitive model weights and potential control over the SGLang server. Additionally, the LoRA target modules are hardcoded, disregarding user configuration, and a broad except Exception: pass is used, which could conceal critical errors.
e6bc6f1 to
a2753c3
Compare
a2753c3 to
c93593b
Compare
Summary
This PR makes SGLang rollout compatible with LoRA training in VERL by refreshing LoRA adapters dynamically (unload + load) on each
update_weights()call, without restarting the SGLang server.In practice, this unblocks RL/RLHF training with Qwen-family models + LoRA + SGLang (when using an SGLang version that supports Qwen LoRA initialization; see “Notes / Requirements” below).
Problem
With SGLang as the rollout backend and LoRA enabled, the rollout server cannot be kept in sync with training reliably:
lora_path. VERL’s SGLang generate path did not passlora_path, so rollouts could silently run without the currently-trained adapter.update_weights_from_tensor) cannot update LoRA weights by parameter names like*.lora_A.weight.What changed
/load_lora_adapterand/unload_lora_adapterendpoints.lora_pathin each generate request so the currently-loaded adapter is actually applied.update_weights():safetensors).unload_lora_adapter()thenload_lora_adapter()for a fixed adapter name derived from replica/node rank.unload_lora_adapteras a benign “not loaded” case during refresh.Notes / Requirements
load_lora_adapter/unload_lora_adapter) and does not switch to the in-memory tensor endpoint.model_pathat startup and (as is typical for LoRA training) base weights are not updated during training; only LoRA parameters are refreshed.get_hidden_dim()can handle common HF module names likeq_projin the fallback path. If you run intoNotImplementedError: get_hidden_dim not implemented for q_proj, please use an SGLang build that includes the corresponding fix (or any newer release that contains it).How to reproduce
The easiest repro in this repo is:
This runs a short baseline vs patched comparison (20 steps) and validates that training no longer crashes when SGLang+LoRA is enabled and that rollouts apply the refreshed adapter.