[rollout] feat: Allow customization of async server class (verl-project#2326)

vyomakesh0728 · vyomakesh0728 · commit 1852b822f790 · 2025-07-03T17:52:20.000+08:00
### What does this PR do? This PR contains two aspects: 1. Introduction of a new configuration option `actor_rollout_ref.rollout.custom_async_server` to allow users to customize the async server class. 2. Make `load_extern_type` more robust and support prefix like `pkg://` or `file://`, while non-breaking to any existing features and supported paths. Without this PR, it's impossible to use a customized version of AsyncvLLMServer in customized use case. We are currently using a set of ugly monkey patch to achieve this goal. Ultimately I believe `rollout.name` and `rollout.custom_async_server` can be combined. But `rollout.name` is currently referenced in too many places. It's quite difficult for me to handle all of them. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: [link](https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+async+server) - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test I have tested on our internal pipelines. The new patch works as expected and the old async servers still work as usual. ### API and Usage Example Our config is something like this: ```yaml hydra: searchpath: - pkg://verl/trainer/config defaults: - ppo_trainer - _self_ data: filter_overlong_prompts: false actor_rollout_ref: rollout: mode: async custom_async_server: path: pkg://mypackage.verl.async_server name: CustomizedvLLMServer ``` ### High-Level Design This PR is pretty straightforward. ### Specific Changes Update the docs. Update behavior in agent loop and async server manager. Update `load_extern_type` implementation. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: I think it's quite troublesome to add a CI for this feature. I can add one if you feel necessary. - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
diff --git a/docs/examples/config.rst b/docs/examples/config.rst
@@ -194,6 +194,11 @@ Actor/Rollout/Reference Policy
         n: 1
         do_sample: False # default eager for validation
 
+      agent:
+        custom_async_server: # Use custom async server implementation for rollout
+          path: null
+          name: null
+
 **Common config for actor, rollout and reference model**
 
 - ``actor_rollout_ref.hybrid_engine``: Whether it's a hybrid engine,
diff --git a/verl/experimental/agent_loop/agent_loop.py b/verl/experimental/agent_loop/agent_loop.py
@@ -341,9 +341,14 @@ def _initialize_llm_servers(self):
         self.async_llm_servers = [None] * self.rollout_dp_size
         self.server_addresses = [None] * self.rollout_dp_size
 
-        server_class = async_server_class(
-            rollout_backend=self.config.actor_rollout_ref.rollout.name,
-        )
+        if self.config.actor_rollout_ref.rollout.agent.custom_async_server:
+            server_class = async_server_class(
+                rollout_backend=self.config.actor_rollout_ref.rollout.name,
+                rollout_backend_module=self.config.actor_rollout_ref.rollout.agent.custom_async_server.path,
+                rollout_backend_class=self.config.actor_rollout_ref.rollout.agent.custom_async_server.name,
+            )
+        else:
+            server_class = async_server_class(rollout_backend=self.config.actor_rollout_ref.rollout.name)
 
         # Start all server instances, restart if address already in use.
         unready_dp_ranks = set(range(self.rollout_dp_size))
diff --git a/verl/trainer/config/ppo_megatron_trainer.yaml b/verl/trainer/config/ppo_megatron_trainer.yaml
@@ -255,13 +255,18 @@ actor_rollout_ref:
       # Number of agent loop workers
       num_workers: 8
 
+      custom_async_server:
+        path: null
+        name: null
+
     # support logging rollout prob for debugging purpose
     calculate_log_probs: False
     # Nsight system profiler configs
     profiler:
       discrete: False
       all_ranks: False
       ranks: null
+
 critic:
   rollout_n: ${actor_rollout_ref.rollout.n}
   strategy: ${actor_rollout_ref.actor.strategy}
diff --git a/verl/trainer/config/ppo_trainer.yaml b/verl/trainer/config/ppo_trainer.yaml
@@ -576,6 +576,15 @@ actor_rollout_ref:
       # Number of agent loop workers
       num_workers: 8
 
+      # custom async server configs
+      custom_async_server:
+
+        # Path to the custom async server implementation
+        path: null
+
+        # Class name of the custom async server class (e.g. AsyncvLLMServer)
+        name: null
+
 # configs for the critic
 critic:
 
diff --git a/verl/utils/import_utils.py b/verl/utils/import_utils.py
@@ -79,21 +79,35 @@ def import_external_libs(external_libs=None):
 
 def load_extern_type(file_path: Optional[str], type_name: Optional[str]):
     """Load a external data type based on the file path and type name"""
+    import importlib
     import importlib.util
     import os
 
     if not file_path:
         return None
 
-    if not os.path.exists(file_path):
-        raise FileNotFoundError(f"Custom type file '{file_path}' not found.")
-
-    spec = importlib.util.spec_from_file_location("custom_module", file_path)
-    module = importlib.util.module_from_spec(spec)
-    try:
-        spec.loader.exec_module(module)
-    except Exception as e:
-        raise RuntimeError(f"Error loading module from '{file_path}'") from e
+    if file_path.startswith("pkg://"):
+        # pkg://verl.utils.dataset.rl_dataset
+        # pkg://verl/utils/dataset/rl_dataset
+        module_name = file_path[6:].replace("/", ".")
+        module = importlib.import_module(module_name)
+
+    else:
+        # file://verl/utils/dataset/rl_dataset
+        # file:///path/to/verl/utils/dataset/rl_dataset.py
+        # or without file:// prefix
+        if file_path.startswith("file://"):
+            file_path = file_path[7:]
+
+        if not os.path.exists(file_path):
+            raise FileNotFoundError(f"Custom type file '{file_path}' not found.")
+
+        spec = importlib.util.spec_from_file_location("custom_module", file_path)
+        module = importlib.util.module_from_spec(spec)
+        try:
+            spec.loader.exec_module(module)
+        except Exception as e:
+            raise RuntimeError(f"Error loading module from '{file_path}'") from e
 
     if not hasattr(module, type_name):
         raise AttributeError(f"Custom type '{type_name}' not found in '{file_path}'.")
diff --git a/verl/workers/rollout/async_server.py b/verl/workers/rollout/async_server.py
@@ -18,7 +18,7 @@
 import threading
 from abc import ABC, abstractmethod
 from contextlib import asynccontextmanager
-from typing import Any, Dict, List, Tuple, Type
+from typing import Any, Dict, List, Optional, Tuple, Type
 
 import fastapi
 import ray
@@ -135,9 +135,14 @@ def __init__(self, config: DictConfig, worker_group: RayWorkerGroup):
         self.async_llm_servers = [None] * self.rollout_dp_size
         self.server_addresses = [None] * self.rollout_dp_size
 
-        server_class = async_server_class(
-            rollout_backend=self.config.rollout.name,
-        )
+        if self.config.rollout.agent.custom_async_server:
+            server_class = async_server_class(
+                rollout_backend=self.config.rollout.name,
+                rollout_backend_module=self.config.rollout.agent.custom_async_server.path,
+                rollout_backend_class=self.config.rollout.agent.custom_async_server.name,
+            )
+        else:
+            server_class = async_server_class(rollout_backend=self.config.rollout.name)
 
         # Start all server instances, restart if address already in use.
         unready_dp_ranks = set(range(self.rollout_dp_size))
@@ -233,22 +238,38 @@ def generate_sequences(self, prompts: DataProto, **sampling_params) -> DataProto
         return future.result()
 
 
-def async_server_class(rollout_backend: str) -> Type[AsyncServerBase]:
+def async_server_class(
+    rollout_backend: str, rollout_backend_module: Optional[str] = None, rollout_backend_class: Optional[str] = None
+) -> Type[AsyncServerBase]:
     """Get async server class.
 
     Args:
-        rollout_backend: str, rollout backend, should be "vllm" or "sglang".
+        rollout_backend: str, rollout backend type (alias), should be "vllm" or "sglang".
+        rollout_backend_module: Optional[str], import path of the rollout backend.
+        rollout_backend_class: Optional[str], class name of the rollout backend.
 
     Returns:
         Type[AsyncServerBase]: async server class.
     """
-    if rollout_backend == "vllm":
-        from verl.workers.rollout.vllm_rollout.vllm_async_server import AsyncvLLMServer
+    if rollout_backend_class is None and rollout_backend_module is None:
+        # If both are None, use the default backend class
+        # Do not change the original import behavior
+        # importlib.import_module and from ... import ... have subtle differences in ray
 
-        return AsyncvLLMServer
-    elif rollout_backend == "sglang":
-        from verl.workers.rollout.sglang_rollout.async_sglang_server import AsyncSglangServer
+        if rollout_backend == "vllm":
+            from verl.workers.rollout.vllm_rollout.vllm_async_server import AsyncvLLMServer
 
-        return AsyncSglangServer
-    else:
-        raise NotImplementedError
+            return AsyncvLLMServer
+        elif rollout_backend == "sglang":
+            from verl.workers.rollout.sglang_rollout.async_sglang_server import AsyncSglangServer
+
+            return AsyncSglangServer
+        else:
+            raise NotImplementedError(f"rollout backend {rollout_backend} is not supported")
+
+    if rollout_backend_module is None or rollout_backend_class is None:
+        raise ValueError("rollout_backend_module and rollout_backend_class must be both provided for customization")
+
+    from verl.utils.import_utils import load_extern_type
+
+    return load_extern_type(rollout_backend_module, rollout_backend_class)