Skip to content

[rollout] feat: Allow customization of async server class#2326

Merged
wuxibin89 merged 4 commits intoverl-project:mainfrom
ultmaster:third-party-async-server
Jul 3, 2025
Merged

[rollout] feat: Allow customization of async server class#2326
wuxibin89 merged 4 commits intoverl-project:mainfrom
ultmaster:third-party-async-server

Conversation

@ultmaster
Copy link
Contributor

@ultmaster ultmaster commented Jul 2, 2025

What does this PR do?

This PR contains two aspects:

  1. Introduction of a new configuration option actor_rollout_ref.rollout.custom_async_server to allow users to customize the async server class.
  2. Make load_extern_type more robust and support prefix like pkg:// or file://, while non-breaking to any existing features and supported paths.

Without this PR, it's impossible to use a customized version of AsyncvLLMServer in customized use case. We are currently using a set of ugly monkey patch to achieve this goal.
Ultimately I believe rollout.name and rollout.custom_async_server can be combined. But rollout.name is currently referenced in too many places. It's quite difficult for me to handle all of them.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: link
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

I have tested on our internal pipelines. The new patch works as expected and the old async servers still work as usual.

API and Usage Example

Our config is something like this:

hydra:
  searchpath:
    - pkg://verl/trainer/config

defaults:
  - ppo_trainer
  - _self_

data:
  filter_overlong_prompts: false

actor_rollout_ref:
  rollout:
    mode: async
    custom_async_server:
      path: pkg://mypackage.verl.async_server
      name: CustomizedvLLMServer

High-Level Design

This PR is pretty straightforward.

Specific Changes

Update the docs. Update behavior in agent loop and async server manager. Update load_extern_type implementation.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

@CLAassistant
Copy link

CLAassistant commented Jul 2, 2025

CLA assistant check
All committers have signed the CLA.

num_workers: 8

# [Experimental] custom async server configs
custom_async_server:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move custom_async_server under agent field, async server is only used by agent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

@wuxibin89 wuxibin89 merged commit bc2cc6b into verl-project:main Jul 3, 2025
43 of 45 checks passed
yellowbee686 pushed a commit to yellowbee686/verl that referenced this pull request Jul 4, 2025
…ct#2326)

### What does this PR do?

This PR contains two aspects:

1. Introduction of a new configuration option
`actor_rollout_ref.rollout.custom_async_server` to allow users to
customize the async server class.
2. Make `load_extern_type` more robust and support prefix like `pkg://`
or `file://`, while non-breaking to any existing features and supported
paths.

Without this PR, it's impossible to use a customized version of
AsyncvLLMServer in customized use case. We are currently using a set of
ugly monkey patch to achieve this goal.
Ultimately I believe `rollout.name` and `rollout.custom_async_server`
can be combined. But `rollout.name` is currently referenced in too many
places. It's quite difficult for me to handle all of them.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[link](https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+async+server)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

I have tested on our internal pipelines. The new patch works as expected
and the old async servers still work as usual.

### API and Usage Example

Our config is something like this:

```yaml
hydra:
  searchpath:
    - pkg://verl/trainer/config

defaults:
  - ppo_trainer
  - _self_

data:
  filter_overlong_prompts: false

actor_rollout_ref:
  rollout:
    mode: async
    custom_async_server:
      path: pkg://mypackage.verl.async_server
      name: CustomizedvLLMServer
```

### High-Level Design

This PR is pretty straightforward.

### Specific Changes

Update the docs. Update behavior in agent loop and async server manager.
Update `load_extern_type` implementation.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: I think it's quite
troublesome to add a CI for this feature. I can add one if you feel
necessary.
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
SuperCB pushed a commit to SuperCB/verl that referenced this pull request Jul 7, 2025
…ct#2326)

### What does this PR do?

This PR contains two aspects:

1. Introduction of a new configuration option
`actor_rollout_ref.rollout.custom_async_server` to allow users to
customize the async server class.
2. Make `load_extern_type` more robust and support prefix like `pkg://`
or `file://`, while non-breaking to any existing features and supported
paths.

Without this PR, it's impossible to use a customized version of
AsyncvLLMServer in customized use case. We are currently using a set of
ugly monkey patch to achieve this goal.
Ultimately I believe `rollout.name` and `rollout.custom_async_server`
can be combined. But `rollout.name` is currently referenced in too many
places. It's quite difficult for me to handle all of them.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[link](https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+async+server)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

I have tested on our internal pipelines. The new patch works as expected
and the old async servers still work as usual.

### API and Usage Example

Our config is something like this:

```yaml
hydra:
  searchpath:
    - pkg://verl/trainer/config

defaults:
  - ppo_trainer
  - _self_

data:
  filter_overlong_prompts: false

actor_rollout_ref:
  rollout:
    mode: async
    custom_async_server:
      path: pkg://mypackage.verl.async_server
      name: CustomizedvLLMServer
```

### High-Level Design

This PR is pretty straightforward.

### Specific Changes

Update the docs. Update behavior in agent loop and async server manager.
Update `load_extern_type` implementation.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: I think it's quite
troublesome to add a CI for this feature. I can add one if you feel
necessary.
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
oseyosey pushed a commit to oseyosey/verl that referenced this pull request Jul 28, 2025
…ct#2326)

### What does this PR do?

This PR contains two aspects:

1. Introduction of a new configuration option
`actor_rollout_ref.rollout.custom_async_server` to allow users to
customize the async server class.
2. Make `load_extern_type` more robust and support prefix like `pkg://`
or `file://`, while non-breaking to any existing features and supported
paths.

Without this PR, it's impossible to use a customized version of
AsyncvLLMServer in customized use case. We are currently using a set of
ugly monkey patch to achieve this goal.
Ultimately I believe `rollout.name` and `rollout.custom_async_server`
can be combined. But `rollout.name` is currently referenced in too many
places. It's quite difficult for me to handle all of them.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[link](https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+async+server)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

I have tested on our internal pipelines. The new patch works as expected
and the old async servers still work as usual.

### API and Usage Example

Our config is something like this:

```yaml
hydra:
  searchpath:
    - pkg://verl/trainer/config

defaults:
  - ppo_trainer
  - _self_

data:
  filter_overlong_prompts: false

actor_rollout_ref:
  rollout:
    mode: async
    custom_async_server:
      path: pkg://mypackage.verl.async_server
      name: CustomizedvLLMServer
```

### High-Level Design

This PR is pretty straightforward.

### Specific Changes

Update the docs. Update behavior in agent loop and async server manager.
Update `load_extern_type` implementation.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: I think it's quite
troublesome to add a CI for this feature. I can add one if you feel
necessary.
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
Juniper1021 pushed a commit to Juniper1021/verl that referenced this pull request Aug 7, 2025
…ct#2326)

### What does this PR do?

This PR contains two aspects:

1. Introduction of a new configuration option
`actor_rollout_ref.rollout.custom_async_server` to allow users to
customize the async server class.
2. Make `load_extern_type` more robust and support prefix like `pkg://`
or `file://`, while non-breaking to any existing features and supported
paths.

Without this PR, it's impossible to use a customized version of
AsyncvLLMServer in customized use case. We are currently using a set of
ugly monkey patch to achieve this goal.
Ultimately I believe `rollout.name` and `rollout.custom_async_server`
can be combined. But `rollout.name` is currently referenced in too many
places. It's quite difficult for me to handle all of them.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[link](https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+async+server)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

I have tested on our internal pipelines. The new patch works as expected
and the old async servers still work as usual.

### API and Usage Example

Our config is something like this:

```yaml
hydra:
  searchpath:
    - pkg://verl/trainer/config

defaults:
  - ppo_trainer
  - _self_

data:
  filter_overlong_prompts: false

actor_rollout_ref:
  rollout:
    mode: async
    custom_async_server:
      path: pkg://mypackage.verl.async_server
      name: CustomizedvLLMServer
```

### High-Level Design

This PR is pretty straightforward.

### Specific Changes

Update the docs. Update behavior in agent loop and async server manager.
Update `load_extern_type` implementation.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: I think it's quite
troublesome to add a CI for this feature. I can add one if you feel
necessary.
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025
…ct#2326)

### What does this PR do?

This PR contains two aspects:

1. Introduction of a new configuration option
`actor_rollout_ref.rollout.custom_async_server` to allow users to
customize the async server class.
2. Make `load_extern_type` more robust and support prefix like `pkg://`
or `file://`, while non-breaking to any existing features and supported
paths.

Without this PR, it's impossible to use a customized version of
AsyncvLLMServer in customized use case. We are currently using a set of
ugly monkey patch to achieve this goal.
Ultimately I believe `rollout.name` and `rollout.custom_async_server`
can be combined. But `rollout.name` is currently referenced in too many
places. It's quite difficult for me to handle all of them.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[link](https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+async+server)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

I have tested on our internal pipelines. The new patch works as expected
and the old async servers still work as usual.

### API and Usage Example

Our config is something like this:

```yaml
hydra:
  searchpath:
    - pkg://verl/trainer/config

defaults:
  - ppo_trainer
  - _self_

data:
  filter_overlong_prompts: false

actor_rollout_ref:
  rollout:
    mode: async
    custom_async_server:
      path: pkg://mypackage.verl.async_server
      name: CustomizedvLLMServer
```

### High-Level Design

This PR is pretty straightforward.

### Specific Changes

Update the docs. Update behavior in agent loop and async server manager.
Update `load_extern_type` implementation.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: I think it's quite
troublesome to add a CI for this feature. I can add one if you feel
necessary.
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
…ct#2326)

### What does this PR do?

This PR contains two aspects:

1. Introduction of a new configuration option
`actor_rollout_ref.rollout.custom_async_server` to allow users to
customize the async server class.
2. Make `load_extern_type` more robust and support prefix like `pkg://`
or `file://`, while non-breaking to any existing features and supported
paths.

Without this PR, it's impossible to use a customized version of
AsyncvLLMServer in customized use case. We are currently using a set of
ugly monkey patch to achieve this goal.
Ultimately I believe `rollout.name` and `rollout.custom_async_server`
can be combined. But `rollout.name` is currently referenced in too many
places. It's quite difficult for me to handle all of them.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[link](https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+async+server)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

I have tested on our internal pipelines. The new patch works as expected
and the old async servers still work as usual.

### API and Usage Example

Our config is something like this:

```yaml
hydra:
  searchpath:
    - pkg://verl/trainer/config

defaults:
  - ppo_trainer
  - _self_

data:
  filter_overlong_prompts: false

actor_rollout_ref:
  rollout:
    mode: async
    custom_async_server:
      path: pkg://mypackage.verl.async_server
      name: CustomizedvLLMServer
```

### High-Level Design

This PR is pretty straightforward.

### Specific Changes

Update the docs. Update behavior in agent loop and async server manager.
Update `load_extern_type` implementation.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: I think it's quite
troublesome to add a CI for this feature. I can add one if you feel
necessary.
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
…ct#2326)

### What does this PR do?

This PR contains two aspects:

1. Introduction of a new configuration option
`actor_rollout_ref.rollout.custom_async_server` to allow users to
customize the async server class.
2. Make `load_extern_type` more robust and support prefix like `pkg://`
or `file://`, while non-breaking to any existing features and supported
paths.

Without this PR, it's impossible to use a customized version of
AsyncvLLMServer in customized use case. We are currently using a set of
ugly monkey patch to achieve this goal.
Ultimately I believe `rollout.name` and `rollout.custom_async_server`
can be combined. But `rollout.name` is currently referenced in too many
places. It's quite difficult for me to handle all of them.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[link](https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+async+server)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

I have tested on our internal pipelines. The new patch works as expected
and the old async servers still work as usual.

### API and Usage Example

Our config is something like this:

```yaml
hydra:
  searchpath:
    - pkg://verl/trainer/config

defaults:
  - ppo_trainer
  - _self_

data:
  filter_overlong_prompts: false

actor_rollout_ref:
  rollout:
    mode: async
    custom_async_server:
      path: pkg://mypackage.verl.async_server
      name: CustomizedvLLMServer
```

### High-Level Design

This PR is pretty straightforward.

### Specific Changes

Update the docs. Update behavior in agent loop and async server manager.
Update `load_extern_type` implementation.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: I think it's quite
troublesome to add a CI for this feature. I can add one if you feel
necessary.
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
oseyosey pushed a commit to oseyosey/verl that referenced this pull request Jan 20, 2026
…ct#2326)

### What does this PR do?

This PR contains two aspects:

1. Introduction of a new configuration option
`actor_rollout_ref.rollout.custom_async_server` to allow users to
customize the async server class.
2. Make `load_extern_type` more robust and support prefix like `pkg://`
or `file://`, while non-breaking to any existing features and supported
paths.

Without this PR, it's impossible to use a customized version of
AsyncvLLMServer in customized use case. We are currently using a set of
ugly monkey patch to achieve this goal.
Ultimately I believe `rollout.name` and `rollout.custom_async_server`
can be combined. But `rollout.name` is currently referenced in too many
places. It's quite difficult for me to handle all of them.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[link](https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+async+server)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

I have tested on our internal pipelines. The new patch works as expected
and the old async servers still work as usual.

### API and Usage Example

Our config is something like this:

```yaml
hydra:
  searchpath:
    - pkg://verl/trainer/config

defaults:
  - ppo_trainer
  - _self_

data:
  filter_overlong_prompts: false

actor_rollout_ref:
  rollout:
    mode: async
    custom_async_server:
      path: pkg://mypackage.verl.async_server
      name: CustomizedvLLMServer
```

### High-Level Design

This PR is pretty straightforward.

### Specific Changes

Update the docs. Update behavior in agent loop and async server manager.
Update `load_extern_type` implementation.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: I think it's quite
troublesome to add a CI for this feature. I can add one if you feel
necessary.
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
…ct#2326)

### What does this PR do?

This PR contains two aspects:

1. Introduction of a new configuration option
`actor_rollout_ref.rollout.custom_async_server` to allow users to
customize the async server class.
2. Make `load_extern_type` more robust and support prefix like `pkg://`
or `file://`, while non-breaking to any existing features and supported
paths.

Without this PR, it's impossible to use a customized version of
AsyncvLLMServer in customized use case. We are currently using a set of
ugly monkey patch to achieve this goal.
Ultimately I believe `rollout.name` and `rollout.custom_async_server`
can be combined. But `rollout.name` is currently referenced in too many
places. It's quite difficult for me to handle all of them.

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here:
[link](https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+async+server)
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

I have tested on our internal pipelines. The new patch works as expected
and the old async servers still work as usual.

### API and Usage Example

Our config is something like this:

```yaml
hydra:
  searchpath:
    - pkg://verl/trainer/config

defaults:
  - ppo_trainer
  - _self_

data:
  filter_overlong_prompts: false

actor_rollout_ref:
  rollout:
    mode: async
    custom_async_server:
      path: pkg://mypackage.verl.async_server
      name: CustomizedvLLMServer
```

### High-Level Design

This PR is pretty straightforward.

### Specific Changes

Update the docs. Update behavior in agent loop and async server manager.
Update `load_extern_type` implementation.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: I think it's quite
troublesome to add a CI for this feature. I can add one if you feel
necessary.
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants