[sglang] feat: Support async multi-turn rollout with simulation feedback in sglang by kinza99 · Pull Request #1630 · verl-project/verl

kinza99 · 2025-05-22T02:52:09Z

Checklist Before Starting

Searched for similar PR(s).
Checked PR Title format
- In format of: [modules] type: Title
- modules are in fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
- type is in feat, fix, refactor, chore, test
- can involve multiple modules, seperated by , or space, like [megatron, fsdp, doc] feat: xxx

What does this PR do?

Implements multi-turn interaction system for reinforcement learning training, enabling dynamic conversational feedback and iterative problem-solving scenarios with extensible agent-based architecture.

Test

Comprehensive testing has been performed including:

Unit Tests: Complete test suite in tests/interactions/test_gsm8k_interaction.py with 15+ test cases covering:

Initialization and configuration validation

Interaction lifecycle (start → generate → calculate → finalize)

Correct/incorrect answer handling with GSM8K scoring integration

Edge cases (empty messages, malformed content, concurrent sessions)

Resource cleanup and error handling

Integration Tests: SGLang rollout integration in tests/workers/rollout/test_sglang_async_rollout_w_interaction.py:

Multi-GPU distributed testing (requires 2+ GPUs)

FSDP model sharding with SGLang inference engine

End-to-end interaction workflow with interaction_kwargs parameter passing

Comparison with HuggingFace baseline for output validation

Real-world Validation: Training script testing with Qwen2.5-0.5B model demonstrating:

GRPO algorithm integration with interaction-based rewards

Multi-turn conversation handling in production training loops

High-Level Design

Multi-Turn Interaction System Architecture:

The system introduces a flexible, async-based interaction framework designed for RL training scenarios with the following key components:

BaseInteraction Class: Core abstraction layer providing async interface for interaction agents

Instance Management: Stateful session management with unique instance IDs for concurrent interactions

SGLang Integration: Seamless integration with SGLang rollout system for multi-turn conversations

Configuration-Driven Loading: Dynamic agent loading via YAML configuration files

Reward Integration: Turn-level scoring mechanism integrated with VERL's reward system

Specific Changes

Core Implementation:

Added BaseInteraction abstract class with async interface in verl/interactions/base.py

Implemented Gsm8kInteraction concrete class for math problem solving scenarios in verl/interactions/gsm8k_interaction.py

Added instance-based state management for concurrent interaction sessions via _instance_dict

Created turn-level scoring mechanism using verl.utils.reward_score.gsm8k with flexible answer extraction

SGLang Rollout Integration (verl/workers/rollout/sglang_rollout/sglang_rollout.py):

Extended _async_rollout_a_request method with AsyncRolloutRequestStateEnum.INTERACTING state

Added interaction initialization in _handle_pending_state method (line 878-880)

Implemented interaction response generation with reward accumulation (line 823-835)

Dynamic interaction loading via _intitalize_interaction method with importlib-based class resolution

Configuration-driven setup supporting interaction_config_path parameter

Testing Infrastructure:

Comprehensive unit test suite in tests/interactions/test_gsm8k_interaction.py with pytest and asyncio support

Integration tests in tests/workers/rollout/test_sglang_async_rollout_w_interaction.py with distributed GPU testing

Mock-based testing using unittest.mock.patch for GSM8K scoring validation

Configuration & Examples:

GSM8K interaction configuration in examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml

Training script: run_qwen2.5-0.5b_gsm8k_multiturn_w_interaction.sh with interaction_config_path parameter

Support for max_user_turns and max_assistant_turns configuration

API

BaseInteraction Interface:

from verl.interactions.base import BaseInteraction
from typing import Dict, Any, List, Tuple, Optional

class CustomInteraction(BaseInteraction):
    def __init__(self, config: Dict[str, Any]):
        super().__init__(config)
    
    async def start_interaction(self, instance_id: Optional[str] = None, **kwargs) -> str:
        # Initialize interaction session, return instance_id
        pass
    
    async def generate_response(self, instance_id: str, messages: List[Dict[str, Any]], **kwargs) -> Tuple[bool, str, float, Dict[str, Any]]:
        # Generate response, return (should_terminate, response, score, metadata)
        pass
    
    async def calculate_score(self) -> float:
        # Calculate turn-level score for RL training
        pass
        
    async def finalize_interaction(self) -> None:
        # Clean up resources
        pass

Usage Example

GSM8K Interaction Configuration:

# gsm8k_interaction_config.yaml
interaction:
  - class_name: "verl.interactions.gsm8k_interaction.Gsm8kInteraction"
    config: {}

Training Script Integration:

python3 -m verl.trainer.main_ppo \
    --config-path="$CONFIG_PATH" \
    --config-name='gsm8k_multiturn_grpo_w_interaction' \
    algorithm.adv_estimator=grpo \
    data.train_batch_size=512 \
    data.return_raw_chat=True \
    actor_rollout_ref.rollout.name=sglang \
    actor_rollout_ref.rollout.multi_turn.interaction_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml" \
    trainer.total_epochs=15

GSM8K Interaction Implementation:

class Gsm8kInteraction(BaseInteraction):
    def __init__(self, config: dict):
        super().__init__(config)
        self._instance_dict = {}

    async def start_interaction(self, instance_id=None, ground_truth=None, **kwargs):
        if instance_id is None:
            instance_id = str(uuid4())
        self._instance_dict[instance_id] = {
            "response": "",
            "ground_truth": ground_truth,
            "reward": 0.0,
        }
        return instance_id

    async def generate_response(self, instance_id, messages, **kwargs):
        # Extract last user message content
        content = ""
        for item in reversed(messages):
            if item.get("role") == "user":
                content = item.get("content", "")
                break

        # Ensure GSM8K format (#### prefix)
        if content.startswith("#### "):
            self._instance_dict[instance_id]["response"] = content
        else:
            self._instance_dict[instance_id]["response"] = "#### " + content

        reward = await self.calculate_score(instance_id)
        if reward == 1.0:
            return True, "Your response is correct!", 1.0, {}
        else:
            return False, "Your response is incorrect! You need to reflect on your answer and try again.", 0.0, {}

    async def calculate_score(self, instance_id, **kwargs):
        return gsm8k.compute_score(
            self._instance_dict[instance_id]["response"],
            self._instance_dict[instance_id]["ground_truth"],
            method="flexible", format_score=0.0, score=1.0,
        )

Contributor List

He Du (Author)
Xiang Long (Co-author)
Yanbin Jiang (Discussed integrated with LangGraph scene)
Junrong Lin (Reviewer)
Haibin Lin (Reviewer)
Chenyang Zhao (PM)

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks.
Add [BREAKING] to the PR title description if it breaks any API.
Update the documentation about your changes in the docs.
New CI unit test(s) are added to cover the code path.
Rely on existing unit tests on CI that covers the code path.

CLAassistant · 2025-05-22T02:52:16Z

All committers have signed the CLA.

kinza99 · 2025-05-22T11:11:47Z

@zhaochenyang20

zhaochenyang20 · 2025-05-22T18:28:23Z

nice work！

zhaochenyang20 · 2025-05-22T18:32:15Z

We will take a look these days, stay tuned in 24h
.

zhaochenyang20 · 2025-05-22T18:37:56Z

@kinza99 my wechat is 18015766633. Feel free to discuss, thanks!

SwordFaith · 2025-05-28T10:01:44Z

examples/data_preprocess/gsm8k_multiturn_w_tool.py

                            # "release_kwargs": {},
                        },
                    },
+                    "feedback_kwargs": {


It might be better to rename it to "interaction_kwargs".

SwordFaith · 2025-05-28T10:02:03Z

docs/sglang_multiturn/multiturn.rst

+
+    actor_rollout_ref:
+        rollout:
+            feedback_config_file: <path_to_feedback_yaml_file>


interaction_config_file

SwordFaith · 2025-05-28T11:25:13Z

verl/interactions/base.py

+        Simulates: get id + state init
+        """
+        # ...implement the logic to get ID and initialize state...
+        interaction_id = "some_unique_id"


It might be better support uuid by default like https://github.com/volcengine/verl/blob/main/verl/tools/base_tool.py#L50.

SwordFaith · 2025-05-28T11:26:15Z

verl/interactions/base.py

+        interaction_id = "some_unique_id"
+        return interaction_id
+
+    async def generate_response(self, messages: Any) -> Tuple[bool, str, float, Dict[str, Any]]:  # More clear response generation method


Sorry for missing instance_id in doc, it would be better keep instance id logic in tool to track session.

SwordFaith · 2025-05-28T11:27:42Z

verl/interactions/gsm8k_interaction.py

+
+        reward = await self.calculate_score(instance_id)
+        if reward == 1.0:
+            feedback = "Your response is correct!"


feedback -> response

SwordFaith · 2025-05-28T11:31:14Z

verl/workers/rollout/sglang_rollout/async_sglang_rollout.py

+                            _req.state = AsyncRolloutRequestStateEnum.WAITING
+                        else:
+                            break
+            elif _req.state == AsyncRolloutRequestStateEnum.WAITING:


WAITING is a bad state name, is there any other concrete state name for interacting with Interaction cls?

SwordFaith · 2025-05-28T11:32:22Z

verl/workers/rollout/sglang_rollout/async_sglang_rollout.py

+                    break
+                else:
+                    _req.add_user_message(self.tokenizer, content, format=self.config.multi_turn.format)
+                    if len(_req.input_ids) >= self.config.max_model_len:


Need add unit test for new added interaction related logic.

ocss884 · 2025-05-29T15:36:04Z

verl/workers/actor/dp_actor.py

        response_length = micro_batch["responses"].size(-1)
        multi_modal_inputs = {}
-        if "multi_modal_inputs" in micro_batch:
+        if "multi_modal_inputs" in micro_batch.keys():


Just curious why adding lot .keys() in this PR? For ... in dict_like_obj the statements are equivalent

In some versions, you may encounter the problem NotImplementedError: TensorDict does not support membership checks with the in keyword. If you want to check if a particular key is in your TensorDict, please use key in tensordict.keys() instead.

ocss884 · 2025-05-29T15:53:15Z

verl/interactions/base.py

+        """
+        # ...implement the logic to calculate turn-level score...
+        score = 0.0
+        return score


How about raise NotImplementedError

How about raise NotImplementedError

The calc_score method is not essential for implementing interactions. Here, we use 0.0 to ensure it has no impact on the aggregated reward.

ocss884 · 2025-05-29T15:54:51Z

verl/workers/fsdp_workers.py

 import psutil
 import torch
 import torch.distributed
+import torch.distributed as dist


Duplicate torch.distributed

ocss884 · 2025-05-29T16:15:21Z

verl/interactions/__init__.py

@@ -0,0 +1,14 @@
+# Copyright 2023-2024 SGLang Team
+# Copyright 2025 ModelBest Inc. and/or its affiliates
+#


pls add the copyright of bytedance

SwordFaith · 2025-05-30T16:17:38Z

To-do:

Review new examples in wandb log.
Refactor duplicated unit tests.

…edback

eric-haibin-lin · 2025-06-04T20:24:01Z

There seems to be lots of changes. I'll take some time to review as well

vermouth1992 · 2025-06-21T10:39:27Z

docs/sglang_multiturn/interaction_system.rst

+    from verl.interactions.base import BaseInteraction
+    from typing import Dict, Any, List, Tuple, Optional
+
+    class BaseInteraction:


Is this just an OpenAI GymEnv?

I guess env or tool can both support in current tools submodule. Interaction is designed to 2 sceneraios:

partial exposure

inject context with workflow as user role (as many framework do)

So BaseInteraction support generate response from model api our other workflow part as endpoint. In partial exposure user requirements, typically need message history to dynamicly add user requirements. In workflow context injection, it typically need messages parsed to other node input and inject other node output back to current node. And we can use e2e outcome to train current node model (If not in current model, need to extend in finalize).

wuxibin89 · 2025-06-21T13:07:05Z

verl/interactions/base.py

+        else:
+            return instance_id
+
+    async def generate_response(self, instance_id: str, messages: List[Dict[str, Any]], **kwargs) -> Tuple[bool, str, float, Dict[str, Any]]:  # More clear response generation method


P2: use a pydantic class as return instead of Tuple

Will switch to pydantic cls in future.

wuxibin89 · 2025-06-21T13:53:30Z

verl/workers/rollout/sglang_rollout/sglang_rollout.py

            self._sgl_tools,
            self._function_call_parser,
        ) = self._initialize_tools(config, tokenizer)
+        self.interaction: dict[str, BaseInteraction] = self._intitalize_interaction(config)


typo: self.interaction is not a dict

It could be fixed in _initialize_interaction, which should support init multiple interactions in same config file. I guess we coulld merge 1630 first and add multiple interactions in future. Current impl already support single task training with interaction.

wuxibin89 · 2025-06-21T13:56:14Z

verl/workers/rollout/sglang_rollout/sglang_rollout.py

+            elif _req.state == AsyncRolloutRequestStateEnum.INTERACTING:
+                user_turns += 1
+                messages = [{"role": x.role, "content": x.content} for x in _req.messages]
+                should_terminate_sequence, content, reward, metrics = await self.interaction.generate_response(_req.request_id, messages, **_req.interaction_kwargs)


Do we support choose different interaction based on assistant response? Or we just use one interaction, and dispatch in it?

We support different interaction cls, according to sample level config, like tools. For message history difference,should be handled inside interaction.

PeterSH6 · 2025-06-21T14:11:04Z

verl/utils/dataset/rl_dataset.py

        # add index for each prompt
        index = row_dict.get("extra_info", {}).get("index", 0)
        tools_kwargs = row_dict.get("extra_info", {}).get("tools_kwargs", {})
+        interaction_kwargs = row_dict.get("extra_info", {}).get("interaction_kwargs", {})


Could you add a config to control whether to include the tool related fields in dataproto? This may increase the size if tools are not used

#2145 Tracked by new issue, will refactor in future.

…ack in sglang (verl-project#1630)

leoleoasd · 2025-06-24T22:09:59Z

Why is user message used to calcualte the reward instead of the assistant message?

SwordFaith · 2025-06-25T03:08:09Z

Why is user message used to calcualte the reward instead of the assistant message?

We are developing a message and turn-level reward system. Calculating rewards during the user's turn doesn't necessarily mean that the user's message requires a reward. Instead, it allows the user to reallocate rewards between the assistant's turns based on the user's turn rewards.

) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This PR implements multi-interaction support in SGLangRollout, enabling sample-level interaction selection similar to the existing tools system. The implementation includes a new interaction registry system that allows multiple named interactions to be configured and used within a single rollout instance. #1630 Core Implementation - New Interaction Registry System: Created verl/interactions/utils/interaction_registry.py with functions to dynamically load and manage multiple interaction instances from configuration files - Enhanced SGLangRollout: - Replaced single interaction attribute with interaction_map: dict[str, BaseInteraction] - Updated _initialize_interactions() method to support multiple interactions via registry - Modified interaction selection logic to use interaction_kwargs.name for sample-level binding - Configuration Updates: Added name field support in interaction config format with automatic name generation fallback Data Processing - Updated GSM8K Preprocessing: Modified examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name field in interaction_kwargs - Enhanced Configuration: Updated examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml with explicit name field Testing & Quality - Comprehensive Test Suite: Added tests/interactions/test_interaction_registry.py with full coverage of registry functionality - Integration Tests: Created tests/workers/rollout/test_sglang_multi_interaction.py for multi-interaction scenarios - Updated Existing Tests: Modified existing interaction tests to support new name attribute and configuration format - Error Handling: Added validation for duplicate names, missing interactions, and edge cases Backward Compatibility - Graceful Degradation: When no interaction config is provided, system works without interactions (empty interaction_map) - Default Name Handling: Falls back to "gsm8k" when no name is specified in interaction_kwargs - Existing API Preservation: All existing interaction functionality remains unchanged Key Features 1. Sample-Level Selection: Each sample can specify which interaction to use via interaction_kwargs.name 2. Registry Pattern: Similar architecture to existing tools system for consistency 3. Automatic Naming: Intelligent name generation from class names (e.g., Gsm8kInteraction → gsm8k) 4. Duplicate Prevention: Runtime validation prevents naming conflicts 5. Flexible Configuration: Supports both explicit names and automatic derivation

…ack in sglang (verl-project#1630)

…rl-project#2184) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This PR implements multi-interaction support in SGLangRollout, enabling sample-level interaction selection similar to the existing tools system. The implementation includes a new interaction registry system that allows multiple named interactions to be configured and used within a single rollout instance. verl-project#1630 Core Implementation - New Interaction Registry System: Created verl/interactions/utils/interaction_registry.py with functions to dynamically load and manage multiple interaction instances from configuration files - Enhanced SGLangRollout: - Replaced single interaction attribute with interaction_map: dict[str, BaseInteraction] - Updated _initialize_interactions() method to support multiple interactions via registry - Modified interaction selection logic to use interaction_kwargs.name for sample-level binding - Configuration Updates: Added name field support in interaction config format with automatic name generation fallback Data Processing - Updated GSM8K Preprocessing: Modified examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name field in interaction_kwargs - Enhanced Configuration: Updated examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml with explicit name field Testing & Quality - Comprehensive Test Suite: Added tests/interactions/test_interaction_registry.py with full coverage of registry functionality - Integration Tests: Created tests/workers/rollout/test_sglang_multi_interaction.py for multi-interaction scenarios - Updated Existing Tests: Modified existing interaction tests to support new name attribute and configuration format - Error Handling: Added validation for duplicate names, missing interactions, and edge cases Backward Compatibility - Graceful Degradation: When no interaction config is provided, system works without interactions (empty interaction_map) - Default Name Handling: Falls back to "gsm8k" when no name is specified in interaction_kwargs - Existing API Preservation: All existing interaction functionality remains unchanged Key Features 1. Sample-Level Selection: Each sample can specify which interaction to use via interaction_kwargs.name 2. Registry Pattern: Similar architecture to existing tools system for consistency 3. Automatic Naming: Intelligent name generation from class names (e.g., Gsm8kInteraction → gsm8k) 4. Duplicate Prevention: Runtime validation prevents naming conflicts 5. Flexible Configuration: Supports both explicit names and automatic derivation

…ack in sglang (verl-project#1630)

…rl-project#2184) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This PR implements multi-interaction support in SGLangRollout, enabling sample-level interaction selection similar to the existing tools system. The implementation includes a new interaction registry system that allows multiple named interactions to be configured and used within a single rollout instance. verl-project#1630 Core Implementation - New Interaction Registry System: Created verl/interactions/utils/interaction_registry.py with functions to dynamically load and manage multiple interaction instances from configuration files - Enhanced SGLangRollout: - Replaced single interaction attribute with interaction_map: dict[str, BaseInteraction] - Updated _initialize_interactions() method to support multiple interactions via registry - Modified interaction selection logic to use interaction_kwargs.name for sample-level binding - Configuration Updates: Added name field support in interaction config format with automatic name generation fallback Data Processing - Updated GSM8K Preprocessing: Modified examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name field in interaction_kwargs - Enhanced Configuration: Updated examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml with explicit name field Testing & Quality - Comprehensive Test Suite: Added tests/interactions/test_interaction_registry.py with full coverage of registry functionality - Integration Tests: Created tests/workers/rollout/test_sglang_multi_interaction.py for multi-interaction scenarios - Updated Existing Tests: Modified existing interaction tests to support new name attribute and configuration format - Error Handling: Added validation for duplicate names, missing interactions, and edge cases Backward Compatibility - Graceful Degradation: When no interaction config is provided, system works without interactions (empty interaction_map) - Default Name Handling: Falls back to "gsm8k" when no name is specified in interaction_kwargs - Existing API Preservation: All existing interaction functionality remains unchanged Key Features 1. Sample-Level Selection: Each sample can specify which interaction to use via interaction_kwargs.name 2. Registry Pattern: Similar architecture to existing tools system for consistency 3. Automatic Naming: Intelligent name generation from class names (e.g., Gsm8kInteraction → gsm8k) 4. Duplicate Prevention: Runtime validation prevents naming conflicts 5. Flexible Configuration: Supports both explicit names and automatic derivation

…ack in sglang (verl-project#1630)

…rl-project#2184) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This PR implements multi-interaction support in SGLangRollout, enabling sample-level interaction selection similar to the existing tools system. The implementation includes a new interaction registry system that allows multiple named interactions to be configured and used within a single rollout instance. verl-project#1630 Core Implementation - New Interaction Registry System: Created verl/interactions/utils/interaction_registry.py with functions to dynamically load and manage multiple interaction instances from configuration files - Enhanced SGLangRollout: - Replaced single interaction attribute with interaction_map: dict[str, BaseInteraction] - Updated _initialize_interactions() method to support multiple interactions via registry - Modified interaction selection logic to use interaction_kwargs.name for sample-level binding - Configuration Updates: Added name field support in interaction config format with automatic name generation fallback Data Processing - Updated GSM8K Preprocessing: Modified examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name field in interaction_kwargs - Enhanced Configuration: Updated examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml with explicit name field Testing & Quality - Comprehensive Test Suite: Added tests/interactions/test_interaction_registry.py with full coverage of registry functionality - Integration Tests: Created tests/workers/rollout/test_sglang_multi_interaction.py for multi-interaction scenarios - Updated Existing Tests: Modified existing interaction tests to support new name attribute and configuration format - Error Handling: Added validation for duplicate names, missing interactions, and edge cases Backward Compatibility - Graceful Degradation: When no interaction config is provided, system works without interactions (empty interaction_map) - Default Name Handling: Falls back to "gsm8k" when no name is specified in interaction_kwargs - Existing API Preservation: All existing interaction functionality remains unchanged Key Features 1. Sample-Level Selection: Each sample can specify which interaction to use via interaction_kwargs.name 2. Registry Pattern: Similar architecture to existing tools system for consistency 3. Automatic Naming: Intelligent name generation from class names (e.g., Gsm8kInteraction → gsm8k) 4. Duplicate Prevention: Runtime validation prevents naming conflicts 5. Flexible Configuration: Supports both explicit names and automatic derivation

…ack in sglang (verl-project#1630)

…rl-project#2184) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This PR implements multi-interaction support in SGLangRollout, enabling sample-level interaction selection similar to the existing tools system. The implementation includes a new interaction registry system that allows multiple named interactions to be configured and used within a single rollout instance. verl-project#1630 Core Implementation - New Interaction Registry System: Created verl/interactions/utils/interaction_registry.py with functions to dynamically load and manage multiple interaction instances from configuration files - Enhanced SGLangRollout: - Replaced single interaction attribute with interaction_map: dict[str, BaseInteraction] - Updated _initialize_interactions() method to support multiple interactions via registry - Modified interaction selection logic to use interaction_kwargs.name for sample-level binding - Configuration Updates: Added name field support in interaction config format with automatic name generation fallback Data Processing - Updated GSM8K Preprocessing: Modified examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name field in interaction_kwargs - Enhanced Configuration: Updated examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml with explicit name field Testing & Quality - Comprehensive Test Suite: Added tests/interactions/test_interaction_registry.py with full coverage of registry functionality - Integration Tests: Created tests/workers/rollout/test_sglang_multi_interaction.py for multi-interaction scenarios - Updated Existing Tests: Modified existing interaction tests to support new name attribute and configuration format - Error Handling: Added validation for duplicate names, missing interactions, and edge cases Backward Compatibility - Graceful Degradation: When no interaction config is provided, system works without interactions (empty interaction_map) - Default Name Handling: Falls back to "gsm8k" when no name is specified in interaction_kwargs - Existing API Preservation: All existing interaction functionality remains unchanged Key Features 1. Sample-Level Selection: Each sample can specify which interaction to use via interaction_kwargs.name 2. Registry Pattern: Similar architecture to existing tools system for consistency 3. Automatic Naming: Intelligent name generation from class names (e.g., Gsm8kInteraction → gsm8k) 4. Duplicate Prevention: Runtime validation prevents naming conflicts 5. Flexible Configuration: Supports both explicit names and automatic derivation

…ack in sglang (verl-project#1630)

…rl-project#2184) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This PR implements multi-interaction support in SGLangRollout, enabling sample-level interaction selection similar to the existing tools system. The implementation includes a new interaction registry system that allows multiple named interactions to be configured and used within a single rollout instance. verl-project#1630 Core Implementation - New Interaction Registry System: Created verl/interactions/utils/interaction_registry.py with functions to dynamically load and manage multiple interaction instances from configuration files - Enhanced SGLangRollout: - Replaced single interaction attribute with interaction_map: dict[str, BaseInteraction] - Updated _initialize_interactions() method to support multiple interactions via registry - Modified interaction selection logic to use interaction_kwargs.name for sample-level binding - Configuration Updates: Added name field support in interaction config format with automatic name generation fallback Data Processing - Updated GSM8K Preprocessing: Modified examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name field in interaction_kwargs - Enhanced Configuration: Updated examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml with explicit name field Testing & Quality - Comprehensive Test Suite: Added tests/interactions/test_interaction_registry.py with full coverage of registry functionality - Integration Tests: Created tests/workers/rollout/test_sglang_multi_interaction.py for multi-interaction scenarios - Updated Existing Tests: Modified existing interaction tests to support new name attribute and configuration format - Error Handling: Added validation for duplicate names, missing interactions, and edge cases Backward Compatibility - Graceful Degradation: When no interaction config is provided, system works without interactions (empty interaction_map) - Default Name Handling: Falls back to "gsm8k" when no name is specified in interaction_kwargs - Existing API Preservation: All existing interaction functionality remains unchanged Key Features 1. Sample-Level Selection: Each sample can specify which interaction to use via interaction_kwargs.name 2. Registry Pattern: Similar architecture to existing tools system for consistency 3. Automatic Naming: Intelligent name generation from class names (e.g., Gsm8kInteraction → gsm8k) 4. Duplicate Prevention: Runtime validation prevents naming conflicts 5. Flexible Configuration: Supports both explicit names and automatic derivation

…ack in sglang (verl-project#1630)

…rl-project#2184) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This PR implements multi-interaction support in SGLangRollout, enabling sample-level interaction selection similar to the existing tools system. The implementation includes a new interaction registry system that allows multiple named interactions to be configured and used within a single rollout instance. verl-project#1630 Core Implementation - New Interaction Registry System: Created verl/interactions/utils/interaction_registry.py with functions to dynamically load and manage multiple interaction instances from configuration files - Enhanced SGLangRollout: - Replaced single interaction attribute with interaction_map: dict[str, BaseInteraction] - Updated _initialize_interactions() method to support multiple interactions via registry - Modified interaction selection logic to use interaction_kwargs.name for sample-level binding - Configuration Updates: Added name field support in interaction config format with automatic name generation fallback Data Processing - Updated GSM8K Preprocessing: Modified examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name field in interaction_kwargs - Enhanced Configuration: Updated examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml with explicit name field Testing & Quality - Comprehensive Test Suite: Added tests/interactions/test_interaction_registry.py with full coverage of registry functionality - Integration Tests: Created tests/workers/rollout/test_sglang_multi_interaction.py for multi-interaction scenarios - Updated Existing Tests: Modified existing interaction tests to support new name attribute and configuration format - Error Handling: Added validation for duplicate names, missing interactions, and edge cases Backward Compatibility - Graceful Degradation: When no interaction config is provided, system works without interactions (empty interaction_map) - Default Name Handling: Falls back to "gsm8k" when no name is specified in interaction_kwargs - Existing API Preservation: All existing interaction functionality remains unchanged Key Features 1. Sample-Level Selection: Each sample can specify which interaction to use via interaction_kwargs.name 2. Registry Pattern: Similar architecture to existing tools system for consistency 3. Automatic Naming: Intelligent name generation from class names (e.g., Gsm8kInteraction → gsm8k) 4. Duplicate Prevention: Runtime validation prevents naming conflicts 5. Flexible Configuration: Supports both explicit names and automatic derivation

kinza99 mentioned this pull request May 22, 2025

Multi-turn rollout & agentic RL Status & Roadmap zhaochenyang20/Awesome-ML-SYS-Tutorial#131

Open

27 tasks

kinza99 force-pushed the duhe/multi_turns_with_feedback branch from 2d46b4e to 1629248 Compare May 28, 2025 07:13

SwordFaith reviewed May 28, 2025

View reviewed changes

kinza99 added 4 commits May 29, 2025 14:10

[FEAT] Support async multi-turn rollout with simulation feedback

fd7a569

[DOC] Update sglang multi-turn rollout doc

1c13235

[Update] update user interaction design

6c2cf1b

[Update] add testing and fix bugs

c0176c1

kinza99 force-pushed the duhe/multi_turns_with_feedback branch from 1629248 to c0176c1 Compare May 29, 2025 06:32

kinza99 requested a review from SwordFaith May 29, 2025 07:59

ocss884 self-assigned this May 29, 2025

ocss884 reviewed May 29, 2025

View reviewed changes

kinza99 and others added 3 commits May 30, 2025 11:18

[Fix] fix some problems

d560cf3

Fix unit-test and separate examples from previous tool

5fbdd11

Fix megatron workers and formatting

cb4baa7

kinza99 and others added 9 commits June 4, 2025 18:11

[Update] merge the latest main version

ed070cc

Add training script

fbfdcd0

Fix assertion

4ea2f1a

Fix max_turns

4b18b69

Fix init interaction missing

878b1aa

Fix interface

8023dcb

Lower gpu mem foot print

dc3157e

Merge remote-tracking branch 'upstream/main' into multi_turns_with_fe…

3104159

…edback

Fix init interaction missing issue

cc31550

SwordFaith mentioned this pull request Jun 21, 2025

[rollout] feat: add agent loop #2124

Merged

8 tasks

vermouth1992 reviewed Jun 21, 2025

View reviewed changes

wuxibin89 reviewed Jun 21, 2025

View reviewed changes

PeterSH6 reviewed Jun 21, 2025

View reviewed changes

SwordFaith mentioned this pull request Jun 22, 2025

[dataset] Support no tool related args in RLHFDataset #2145

Open

eric-haibin-lin approved these changes Jun 22, 2025

View reviewed changes

eric-haibin-lin enabled auto-merge (squash) June 22, 2025 16:43

eric-haibin-lin disabled auto-merge June 22, 2025 16:46

eric-haibin-lin merged commit c7aa5e8 into verl-project:main Jun 22, 2025
36 of 37 checks passed

yellowbee686 pushed a commit to yellowbee686/verl that referenced this pull request Jun 23, 2025

[sglang] feat: Support async multi-turn rollout with simulation feedb…

f1e735d

…ack in sglang (verl-project#1630)

Sirius-L1 pushed a commit to Sirius-L1/verl that referenced this pull request Jun 24, 2025

[sglang] feat: Support async multi-turn rollout with simulation feedb…

ca6cdee

…ack in sglang (verl-project#1630)

SwordFaith mentioned this pull request Jun 24, 2025

[sglang] feat: Add multi-interaction registry support and testing #2184

Merged

7 tasks

Tyizhanshen pushed a commit to HyperdriveHustle/verl that referenced this pull request Jul 1, 2025

[sglang] feat: Support async multi-turn rollout with simulation feedb…

563b56d

…ack in sglang (verl-project#1630)

chenxia-han mentioned this pull request Jul 3, 2025

[Bug] SGLang async sampling parameter corruption #2087

Open

oseyosey pushed a commit to oseyosey/verl that referenced this pull request Jul 28, 2025

[sglang] feat: Support async multi-turn rollout with simulation feedb…

ce6c02c

…ack in sglang (verl-project#1630)

whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025

[sglang] feat: Support async multi-turn rollout with simulation feedb…

e920f21

…ack in sglang (verl-project#1630)

chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025

[sglang] feat: Support async multi-turn rollout with simulation feedb…

0af608e

…ack in sglang (verl-project#1630)

TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025

[sglang] feat: Support async multi-turn rollout with simulation feedb…

296d7ca

…ack in sglang (verl-project#1630)

oseyosey pushed a commit to oseyosey/verl that referenced this pull request Jan 20, 2026

[sglang] feat: Support async multi-turn rollout with simulation feedb…

238bbe2

…ack in sglang (verl-project#1630)

vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026

[sglang] feat: Support async multi-turn rollout with simulation feedb…

5625d07

…ack in sglang (verl-project#1630)

Conversation

kinza99 commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist Before Starting

What does this PR do?

Test

High-Level Design

Specific Changes

API

Usage Example

Contributor List

Checklist Before Submitting

Uh oh!

CLAassistant commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kinza99 commented May 22, 2025

Uh oh!

zhaochenyang20 commented May 22, 2025

Uh oh!

zhaochenyang20 commented May 22, 2025

Uh oh!

zhaochenyang20 commented May 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SwordFaith commented May 30, 2025

Uh oh!

eric-haibin-lin commented Jun 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SwordFaith Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

leoleoasd commented Jun 24, 2025

Uh oh!

SwordFaith commented Jun 25, 2025

kinza99 commented May 22, 2025 •

edited

Loading

CLAassistant commented May 22, 2025 •

edited

Loading

SwordFaith Jun 22, 2025 •

edited

Loading