Skip to content

[sglang] feat: Support async multi-turn rollout with simulation feedback in sglang#1630

Merged
eric-haibin-lin merged 47 commits intoverl-project:mainfrom
kinza99:duhe/multi_turns_with_feedback
Jun 22, 2025
Merged

[sglang] feat: Support async multi-turn rollout with simulation feedback in sglang#1630
eric-haibin-lin merged 47 commits intoverl-project:mainfrom
kinza99:duhe/multi_turns_with_feedback

Conversation

@kinza99
Copy link
Contributor

@kinza99 kinza99 commented May 22, 2025

Checklist Before Starting

  • Searched for similar PR(s).
  • Checked PR Title format
    • In format of: [modules] type: Title
    • modules are in fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • type is in feat, fix, refactor, chore, test
    • can involve multiple modules, seperated by , or space, like [megatron, fsdp, doc] feat: xxx

What does this PR do?

Implements multi-turn interaction system for reinforcement learning training, enabling dynamic conversational feedback and iterative problem-solving scenarios with extensible agent-based architecture.

Test

Comprehensive testing has been performed including:

  • Unit Tests: Complete test suite in tests/interactions/test_gsm8k_interaction.py with 15+ test cases covering:
    • Initialization and configuration validation
    • Interaction lifecycle (start → generate → calculate → finalize)
    • Correct/incorrect answer handling with GSM8K scoring integration
    • Edge cases (empty messages, malformed content, concurrent sessions)
    • Resource cleanup and error handling
  • Integration Tests: SGLang rollout integration in tests/workers/rollout/test_sglang_async_rollout_w_interaction.py:
    • Multi-GPU distributed testing (requires 2+ GPUs)
    • FSDP model sharding with SGLang inference engine
    • End-to-end interaction workflow with interaction_kwargs parameter passing
    • Comparison with HuggingFace baseline for output validation
  • Real-world Validation: Training script testing with Qwen2.5-0.5B model demonstrating:
    • GRPO algorithm integration with interaction-based rewards
    • Multi-turn conversation handling in production training loops

High-Level Design

Multi-Turn Interaction System Architecture:

The system introduces a flexible, async-based interaction framework designed for RL training scenarios with the following key components:

  1. BaseInteraction Class: Core abstraction layer providing async interface for interaction agents
  2. Instance Management: Stateful session management with unique instance IDs for concurrent interactions
  3. SGLang Integration: Seamless integration with SGLang rollout system for multi-turn conversations
  4. Configuration-Driven Loading: Dynamic agent loading via YAML configuration files
  5. Reward Integration: Turn-level scoring mechanism integrated with VERL's reward system

Specific Changes

Core Implementation:

  • Added BaseInteraction abstract class with async interface in verl/interactions/base.py
  • Implemented Gsm8kInteraction concrete class for math problem solving scenarios in verl/interactions/gsm8k_interaction.py
  • Added instance-based state management for concurrent interaction sessions via _instance_dict
  • Created turn-level scoring mechanism using verl.utils.reward_score.gsm8k with flexible answer extraction

SGLang Rollout Integration (verl/workers/rollout/sglang_rollout/sglang_rollout.py):

  • Extended _async_rollout_a_request method with AsyncRolloutRequestStateEnum.INTERACTING state
  • Added interaction initialization in _handle_pending_state method (line 878-880)
  • Implemented interaction response generation with reward accumulation (line 823-835)
  • Dynamic interaction loading via _intitalize_interaction method with importlib-based class resolution
  • Configuration-driven setup supporting interaction_config_path parameter

Testing Infrastructure:

  • Comprehensive unit test suite in tests/interactions/test_gsm8k_interaction.py with pytest and asyncio support
  • Integration tests in tests/workers/rollout/test_sglang_async_rollout_w_interaction.py with distributed GPU testing
  • Mock-based testing using unittest.mock.patch for GSM8K scoring validation

Configuration & Examples:

  • GSM8K interaction configuration in examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml
  • Training script: run_qwen2.5-0.5b_gsm8k_multiturn_w_interaction.sh with interaction_config_path parameter
  • Support for max_user_turns and max_assistant_turns configuration

API

BaseInteraction Interface:

from verl.interactions.base import BaseInteraction
from typing import Dict, Any, List, Tuple, Optional

class CustomInteraction(BaseInteraction):
    def __init__(self, config: Dict[str, Any]):
        super().__init__(config)
    
    async def start_interaction(self, instance_id: Optional[str] = None, **kwargs) -> str:
        # Initialize interaction session, return instance_id
        pass
    
    async def generate_response(self, instance_id: str, messages: List[Dict[str, Any]], **kwargs) -> Tuple[bool, str, float, Dict[str, Any]]:
        # Generate response, return (should_terminate, response, score, metadata)
        pass
    
    async def calculate_score(self) -> float:
        # Calculate turn-level score for RL training
        pass
        
    async def finalize_interaction(self) -> None:
        # Clean up resources
        pass

Usage Example

GSM8K Interaction Configuration:

# gsm8k_interaction_config.yaml
interaction:
  - class_name: "verl.interactions.gsm8k_interaction.Gsm8kInteraction"
    config: {}

Training Script Integration:

python3 -m verl.trainer.main_ppo \
    --config-path="$CONFIG_PATH" \
    --config-name='gsm8k_multiturn_grpo_w_interaction' \
    algorithm.adv_estimator=grpo \
    data.train_batch_size=512 \
    data.return_raw_chat=True \
    actor_rollout_ref.rollout.name=sglang \
    actor_rollout_ref.rollout.multi_turn.interaction_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml" \
    trainer.total_epochs=15

GSM8K Interaction Implementation:

class Gsm8kInteraction(BaseInteraction):
    def __init__(self, config: dict):
        super().__init__(config)
        self._instance_dict = {}

    async def start_interaction(self, instance_id=None, ground_truth=None, **kwargs):
        if instance_id is None:
            instance_id = str(uuid4())
        self._instance_dict[instance_id] = {
            "response": "",
            "ground_truth": ground_truth,
            "reward": 0.0,
        }
        return instance_id

    async def generate_response(self, instance_id, messages, **kwargs):
        # Extract last user message content
        content = ""
        for item in reversed(messages):
            if item.get("role") == "user":
                content = item.get("content", "")
                break

        # Ensure GSM8K format (#### prefix)
        if content.startswith("#### "):
            self._instance_dict[instance_id]["response"] = content
        else:
            self._instance_dict[instance_id]["response"] = "#### " + content

        reward = await self.calculate_score(instance_id)
        if reward == 1.0:
            return True, "Your response is correct!", 1.0, {}
        else:
            return False, "Your response is incorrect! You need to reflect on your answer and try again.", 0.0, {}

    async def calculate_score(self, instance_id, **kwargs):
        return gsm8k.compute_score(
            self._instance_dict[instance_id]["response"],
            self._instance_dict[instance_id]["ground_truth"],
            method="flexible", format_score=0.0, score=1.0,
        )

Contributor List

  • He Du (Author)
  • Xiang Long (Co-author)
  • Yanbin Jiang (Discussed integrated with LangGraph scene)
  • Junrong Lin (Reviewer)
  • Haibin Lin (Reviewer)
  • Chenyang Zhao (PM)

Checklist Before Submitting

  • Read the Contribute Guide.
  • Apply pre-commit checks.
  • Add [BREAKING] to the PR title description if it breaks any API.
  • Update the documentation about your changes in the docs.
  • New CI unit test(s) are added to cover the code path.
  • Rely on existing unit tests on CI that covers the code path.

@CLAassistant
Copy link

CLAassistant commented May 22, 2025

CLA assistant check
All committers have signed the CLA.

@kinza99
Copy link
Contributor Author

kinza99 commented May 22, 2025

@zhaochenyang20

@zhaochenyang20
Copy link
Collaborator

nice work!

@zhaochenyang20
Copy link
Collaborator

We will take a look these days, stay tuned in 24h
.

@zhaochenyang20
Copy link
Collaborator

@kinza99 my wechat is 18015766633. Feel free to discuss, thanks!

@kinza99 kinza99 force-pushed the duhe/multi_turns_with_feedback branch from 2d46b4e to 1629248 Compare May 28, 2025 07:13
# "release_kwargs": {},
},
},
"feedback_kwargs": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to rename it to "interaction_kwargs".


actor_rollout_ref:
rollout:
feedback_config_file: <path_to_feedback_yaml_file>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interaction_config_file

Simulates: get id + state init
"""
# ...implement the logic to get ID and initialize state...
interaction_id = "some_unique_id"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interaction_id = "some_unique_id"
return interaction_id

async def generate_response(self, messages: Any) -> Tuple[bool, str, float, Dict[str, Any]]: # More clear response generation method
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for missing instance_id in doc, it would be better keep instance id logic in tool to track session.


reward = await self.calculate_score(instance_id)
if reward == 1.0:
feedback = "Your response is correct!"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feedback -> response

_req.state = AsyncRolloutRequestStateEnum.WAITING
else:
break
elif _req.state == AsyncRolloutRequestStateEnum.WAITING:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WAITING is a bad state name, is there any other concrete state name for interacting with Interaction cls?

break
else:
_req.add_user_message(self.tokenizer, content, format=self.config.multi_turn.format)
if len(_req.input_ids) >= self.config.max_model_len:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need add unit test for new added interaction related logic.

@kinza99 kinza99 force-pushed the duhe/multi_turns_with_feedback branch from 1629248 to c0176c1 Compare May 29, 2025 06:32
@kinza99 kinza99 requested a review from SwordFaith May 29, 2025 07:59
@ocss884 ocss884 self-assigned this May 29, 2025
response_length = micro_batch["responses"].size(-1)
multi_modal_inputs = {}
if "multi_modal_inputs" in micro_batch:
if "multi_modal_inputs" in micro_batch.keys():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why adding lot .keys() in this PR? For ... in dict_like_obj the statements are equivalent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some versions, you may encounter the problem NotImplementedError: TensorDict does not support membership checks with the in keyword. If you want to check if a particular key is in your TensorDict, please use key in tensordict.keys() instead.

"""
# ...implement the logic to calculate turn-level score...
score = 0.0
return score
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about raise NotImplementedError

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about raise NotImplementedError

The calc_score method is not essential for implementing interactions. Here, we use 0.0 to ensure it has no impact on the aggregated reward.

import psutil
import torch
import torch.distributed
import torch.distributed as dist
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate torch.distributed

@@ -0,0 +1,14 @@
# Copyright 2023-2024 SGLang Team
# Copyright 2025 ModelBest Inc. and/or its affiliates
#
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add the copyright of bytedance

@SwordFaith
Copy link
Collaborator

To-do:

  1. Review new examples in wandb log.
  2. Refactor duplicated unit tests.

@eric-haibin-lin
Copy link
Collaborator

There seems to be lots of changes. I'll take some time to review as well

@SwordFaith SwordFaith mentioned this pull request Jun 21, 2025
8 tasks
from verl.interactions.base import BaseInteraction
from typing import Dict, Any, List, Tuple, Optional

class BaseInteraction:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just an OpenAI GymEnv?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess env or tool can both support in current tools submodule. Interaction is designed to 2 sceneraios:

  • partial exposure
  • inject context with workflow as user role (as many framework do)

So BaseInteraction support generate response from model api our other workflow part as endpoint. In partial exposure user requirements, typically need message history to dynamicly add user requirements. In workflow context injection, it typically need messages parsed to other node input and inject other node output back to current node. And we can use e2e outcome to train current node model (If not in current model, need to extend in finalize).

else:
return instance_id

async def generate_response(self, instance_id: str, messages: List[Dict[str, Any]], **kwargs) -> Tuple[bool, str, float, Dict[str, Any]]: # More clear response generation method
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: use a pydantic class as return instead of Tuple

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will switch to pydantic cls in future.

self._sgl_tools,
self._function_call_parser,
) = self._initialize_tools(config, tokenizer)
self.interaction: dict[str, BaseInteraction] = self._intitalize_interaction(config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: self.interaction is not a dict

Copy link
Collaborator

@SwordFaith SwordFaith Jun 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be fixed in _initialize_interaction, which should support init multiple interactions in same config file. I guess we coulld merge 1630 first and add multiple interactions in future. Current impl already support single task training with interaction.

elif _req.state == AsyncRolloutRequestStateEnum.INTERACTING:
user_turns += 1
messages = [{"role": x.role, "content": x.content} for x in _req.messages]
should_terminate_sequence, content, reward, metrics = await self.interaction.generate_response(_req.request_id, messages, **_req.interaction_kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support choose different interaction based on assistant response? Or we just use one interaction, and dispatch in it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We support different interaction cls, according to sample level config, like tools. For message history difference,should be handled inside interaction.

# add index for each prompt
index = row_dict.get("extra_info", {}).get("index", 0)
tools_kwargs = row_dict.get("extra_info", {}).get("tools_kwargs", {})
interaction_kwargs = row_dict.get("extra_info", {}).get("interaction_kwargs", {})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a config to control whether to include the tool related fields in dataproto? This may increase the size if tools are not used

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2145 Tracked by new issue, will refactor in future.

@eric-haibin-lin eric-haibin-lin enabled auto-merge (squash) June 22, 2025 16:43
@eric-haibin-lin eric-haibin-lin disabled auto-merge June 22, 2025 16:46
@eric-haibin-lin eric-haibin-lin merged commit c7aa5e8 into verl-project:main Jun 22, 2025
36 of 37 checks passed
yellowbee686 pushed a commit to yellowbee686/verl that referenced this pull request Jun 23, 2025
Sirius-L1 pushed a commit to Sirius-L1/verl that referenced this pull request Jun 24, 2025
@leoleoasd
Copy link

image
Why is user message used to calcualte the reward instead of the assistant message?

@SwordFaith
Copy link
Collaborator

image Why is user message used to calcualte the reward instead of the assistant message?

We are developing a message and turn-level reward system. Calculating rewards during the user's turn doesn't necessarily mean that the user's message requires a reward. Instead, it allows the user to reallocate rewards between the assistant's turns based on the user's turn rewards.

chenhaiq pushed a commit that referenced this pull request Jun 27, 2025
)

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

This PR implements multi-interaction support in SGLangRollout, enabling
sample-level interaction selection similar to the existing tools system.
The implementation includes a new interaction registry system that
allows multiple named
interactions to be configured and used within a single rollout instance.
#1630

Core Implementation

- New Interaction Registry System: Created
verl/interactions/utils/interaction_registry.py with functions to
dynamically load and manage multiple interaction instances from
configuration files
  - Enhanced SGLangRollout:
- Replaced single interaction attribute with interaction_map: dict[str,
BaseInteraction]
- Updated _initialize_interactions() method to support multiple
interactions via registry
- Modified interaction selection logic to use interaction_kwargs.name
for sample-level binding
- Configuration Updates: Added name field support in interaction config
format with automatic name generation fallback

  Data Processing

- Updated GSM8K Preprocessing: Modified
examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name
field in interaction_kwargs
- Enhanced Configuration: Updated
examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml
with explicit name field

  Testing & Quality

- Comprehensive Test Suite: Added
tests/interactions/test_interaction_registry.py with full coverage of
registry functionality
- Integration Tests: Created
tests/workers/rollout/test_sglang_multi_interaction.py for
multi-interaction scenarios
- Updated Existing Tests: Modified existing interaction tests to support
new name attribute and configuration format
- Error Handling: Added validation for duplicate names, missing
interactions, and edge cases

  Backward Compatibility

- Graceful Degradation: When no interaction config is provided, system
works without interactions (empty interaction_map)
- Default Name Handling: Falls back to "gsm8k" when no name is specified
in interaction_kwargs
- Existing API Preservation: All existing interaction functionality
remains unchanged

 Key Features

1. Sample-Level Selection: Each sample can specify which interaction to
use via interaction_kwargs.name
2. Registry Pattern: Similar architecture to existing tools system for
consistency
3. Automatic Naming: Intelligent name generation from class names (e.g.,
Gsm8kInteraction → gsm8k)
  4. Duplicate Prevention: Runtime validation prevents naming conflicts
5. Flexible Configuration: Supports both explicit names and automatic
derivation
Tyizhanshen pushed a commit to HyperdriveHustle/verl that referenced this pull request Jul 1, 2025
oseyosey pushed a commit to oseyosey/verl that referenced this pull request Jul 28, 2025
oseyosey pushed a commit to oseyosey/verl that referenced this pull request Jul 28, 2025
…rl-project#2184)

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

This PR implements multi-interaction support in SGLangRollout, enabling
sample-level interaction selection similar to the existing tools system.
The implementation includes a new interaction registry system that
allows multiple named
interactions to be configured and used within a single rollout instance.
verl-project#1630

Core Implementation

- New Interaction Registry System: Created
verl/interactions/utils/interaction_registry.py with functions to
dynamically load and manage multiple interaction instances from
configuration files
  - Enhanced SGLangRollout:
- Replaced single interaction attribute with interaction_map: dict[str,
BaseInteraction]
- Updated _initialize_interactions() method to support multiple
interactions via registry
- Modified interaction selection logic to use interaction_kwargs.name
for sample-level binding
- Configuration Updates: Added name field support in interaction config
format with automatic name generation fallback

  Data Processing

- Updated GSM8K Preprocessing: Modified
examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name
field in interaction_kwargs
- Enhanced Configuration: Updated
examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml
with explicit name field

  Testing & Quality

- Comprehensive Test Suite: Added
tests/interactions/test_interaction_registry.py with full coverage of
registry functionality
- Integration Tests: Created
tests/workers/rollout/test_sglang_multi_interaction.py for
multi-interaction scenarios
- Updated Existing Tests: Modified existing interaction tests to support
new name attribute and configuration format
- Error Handling: Added validation for duplicate names, missing
interactions, and edge cases

  Backward Compatibility

- Graceful Degradation: When no interaction config is provided, system
works without interactions (empty interaction_map)
- Default Name Handling: Falls back to "gsm8k" when no name is specified
in interaction_kwargs
- Existing API Preservation: All existing interaction functionality
remains unchanged

 Key Features

1. Sample-Level Selection: Each sample can specify which interaction to
use via interaction_kwargs.name
2. Registry Pattern: Similar architecture to existing tools system for
consistency
3. Automatic Naming: Intelligent name generation from class names (e.g.,
Gsm8kInteraction → gsm8k)
  4. Duplicate Prevention: Runtime validation prevents naming conflicts
5. Flexible Configuration: Supports both explicit names and automatic
derivation
Juniper1021 pushed a commit to Juniper1021/verl that referenced this pull request Aug 7, 2025
…rl-project#2184)

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

This PR implements multi-interaction support in SGLangRollout, enabling
sample-level interaction selection similar to the existing tools system.
The implementation includes a new interaction registry system that
allows multiple named
interactions to be configured and used within a single rollout instance.
verl-project#1630

Core Implementation

- New Interaction Registry System: Created
verl/interactions/utils/interaction_registry.py with functions to
dynamically load and manage multiple interaction instances from
configuration files
  - Enhanced SGLangRollout:
- Replaced single interaction attribute with interaction_map: dict[str,
BaseInteraction]
- Updated _initialize_interactions() method to support multiple
interactions via registry
- Modified interaction selection logic to use interaction_kwargs.name
for sample-level binding
- Configuration Updates: Added name field support in interaction config
format with automatic name generation fallback

  Data Processing

- Updated GSM8K Preprocessing: Modified
examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name
field in interaction_kwargs
- Enhanced Configuration: Updated
examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml
with explicit name field

  Testing & Quality

- Comprehensive Test Suite: Added
tests/interactions/test_interaction_registry.py with full coverage of
registry functionality
- Integration Tests: Created
tests/workers/rollout/test_sglang_multi_interaction.py for
multi-interaction scenarios
- Updated Existing Tests: Modified existing interaction tests to support
new name attribute and configuration format
- Error Handling: Added validation for duplicate names, missing
interactions, and edge cases

  Backward Compatibility

- Graceful Degradation: When no interaction config is provided, system
works without interactions (empty interaction_map)
- Default Name Handling: Falls back to "gsm8k" when no name is specified
in interaction_kwargs
- Existing API Preservation: All existing interaction functionality
remains unchanged

 Key Features

1. Sample-Level Selection: Each sample can specify which interaction to
use via interaction_kwargs.name
2. Registry Pattern: Similar architecture to existing tools system for
consistency
3. Automatic Naming: Intelligent name generation from class names (e.g.,
Gsm8kInteraction → gsm8k)
  4. Duplicate Prevention: Runtime validation prevents naming conflicts
5. Flexible Configuration: Supports both explicit names and automatic
derivation
whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025
whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025
…rl-project#2184)

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

This PR implements multi-interaction support in SGLangRollout, enabling
sample-level interaction selection similar to the existing tools system.
The implementation includes a new interaction registry system that
allows multiple named
interactions to be configured and used within a single rollout instance.
verl-project#1630

Core Implementation

- New Interaction Registry System: Created
verl/interactions/utils/interaction_registry.py with functions to
dynamically load and manage multiple interaction instances from
configuration files
  - Enhanced SGLangRollout:
- Replaced single interaction attribute with interaction_map: dict[str,
BaseInteraction]
- Updated _initialize_interactions() method to support multiple
interactions via registry
- Modified interaction selection logic to use interaction_kwargs.name
for sample-level binding
- Configuration Updates: Added name field support in interaction config
format with automatic name generation fallback

  Data Processing

- Updated GSM8K Preprocessing: Modified
examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name
field in interaction_kwargs
- Enhanced Configuration: Updated
examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml
with explicit name field

  Testing & Quality

- Comprehensive Test Suite: Added
tests/interactions/test_interaction_registry.py with full coverage of
registry functionality
- Integration Tests: Created
tests/workers/rollout/test_sglang_multi_interaction.py for
multi-interaction scenarios
- Updated Existing Tests: Modified existing interaction tests to support
new name attribute and configuration format
- Error Handling: Added validation for duplicate names, missing
interactions, and edge cases

  Backward Compatibility

- Graceful Degradation: When no interaction config is provided, system
works without interactions (empty interaction_map)
- Default Name Handling: Falls back to "gsm8k" when no name is specified
in interaction_kwargs
- Existing API Preservation: All existing interaction functionality
remains unchanged

 Key Features

1. Sample-Level Selection: Each sample can specify which interaction to
use via interaction_kwargs.name
2. Registry Pattern: Similar architecture to existing tools system for
consistency
3. Automatic Naming: Intelligent name generation from class names (e.g.,
Gsm8kInteraction → gsm8k)
  4. Duplicate Prevention: Runtime validation prevents naming conflicts
5. Flexible Configuration: Supports both explicit names and automatic
derivation
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
…rl-project#2184)

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

This PR implements multi-interaction support in SGLangRollout, enabling
sample-level interaction selection similar to the existing tools system.
The implementation includes a new interaction registry system that
allows multiple named
interactions to be configured and used within a single rollout instance.
verl-project#1630

Core Implementation

- New Interaction Registry System: Created
verl/interactions/utils/interaction_registry.py with functions to
dynamically load and manage multiple interaction instances from
configuration files
  - Enhanced SGLangRollout:
- Replaced single interaction attribute with interaction_map: dict[str,
BaseInteraction]
- Updated _initialize_interactions() method to support multiple
interactions via registry
- Modified interaction selection logic to use interaction_kwargs.name
for sample-level binding
- Configuration Updates: Added name field support in interaction config
format with automatic name generation fallback

  Data Processing

- Updated GSM8K Preprocessing: Modified
examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name
field in interaction_kwargs
- Enhanced Configuration: Updated
examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml
with explicit name field

  Testing & Quality

- Comprehensive Test Suite: Added
tests/interactions/test_interaction_registry.py with full coverage of
registry functionality
- Integration Tests: Created
tests/workers/rollout/test_sglang_multi_interaction.py for
multi-interaction scenarios
- Updated Existing Tests: Modified existing interaction tests to support
new name attribute and configuration format
- Error Handling: Added validation for duplicate names, missing
interactions, and edge cases

  Backward Compatibility

- Graceful Degradation: When no interaction config is provided, system
works without interactions (empty interaction_map)
- Default Name Handling: Falls back to "gsm8k" when no name is specified
in interaction_kwargs
- Existing API Preservation: All existing interaction functionality
remains unchanged

 Key Features

1. Sample-Level Selection: Each sample can specify which interaction to
use via interaction_kwargs.name
2. Registry Pattern: Similar architecture to existing tools system for
consistency
3. Automatic Naming: Intelligent name generation from class names (e.g.,
Gsm8kInteraction → gsm8k)
  4. Duplicate Prevention: Runtime validation prevents naming conflicts
5. Flexible Configuration: Supports both explicit names and automatic
derivation
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
…rl-project#2184)

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

This PR implements multi-interaction support in SGLangRollout, enabling
sample-level interaction selection similar to the existing tools system.
The implementation includes a new interaction registry system that
allows multiple named
interactions to be configured and used within a single rollout instance.
verl-project#1630

Core Implementation

- New Interaction Registry System: Created
verl/interactions/utils/interaction_registry.py with functions to
dynamically load and manage multiple interaction instances from
configuration files
  - Enhanced SGLangRollout:
- Replaced single interaction attribute with interaction_map: dict[str,
BaseInteraction]
- Updated _initialize_interactions() method to support multiple
interactions via registry
- Modified interaction selection logic to use interaction_kwargs.name
for sample-level binding
- Configuration Updates: Added name field support in interaction config
format with automatic name generation fallback

  Data Processing

- Updated GSM8K Preprocessing: Modified
examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name
field in interaction_kwargs
- Enhanced Configuration: Updated
examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml
with explicit name field

  Testing & Quality

- Comprehensive Test Suite: Added
tests/interactions/test_interaction_registry.py with full coverage of
registry functionality
- Integration Tests: Created
tests/workers/rollout/test_sglang_multi_interaction.py for
multi-interaction scenarios
- Updated Existing Tests: Modified existing interaction tests to support
new name attribute and configuration format
- Error Handling: Added validation for duplicate names, missing
interactions, and edge cases

  Backward Compatibility

- Graceful Degradation: When no interaction config is provided, system
works without interactions (empty interaction_map)
- Default Name Handling: Falls back to "gsm8k" when no name is specified
in interaction_kwargs
- Existing API Preservation: All existing interaction functionality
remains unchanged

 Key Features

1. Sample-Level Selection: Each sample can specify which interaction to
use via interaction_kwargs.name
2. Registry Pattern: Similar architecture to existing tools system for
consistency
3. Automatic Naming: Intelligent name generation from class names (e.g.,
Gsm8kInteraction → gsm8k)
  4. Duplicate Prevention: Runtime validation prevents naming conflicts
5. Flexible Configuration: Supports both explicit names and automatic
derivation
oseyosey pushed a commit to oseyosey/verl that referenced this pull request Jan 20, 2026
oseyosey pushed a commit to oseyosey/verl that referenced this pull request Jan 20, 2026
…rl-project#2184)

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

This PR implements multi-interaction support in SGLangRollout, enabling
sample-level interaction selection similar to the existing tools system.
The implementation includes a new interaction registry system that
allows multiple named
interactions to be configured and used within a single rollout instance.
verl-project#1630

Core Implementation

- New Interaction Registry System: Created
verl/interactions/utils/interaction_registry.py with functions to
dynamically load and manage multiple interaction instances from
configuration files
  - Enhanced SGLangRollout:
- Replaced single interaction attribute with interaction_map: dict[str,
BaseInteraction]
- Updated _initialize_interactions() method to support multiple
interactions via registry
- Modified interaction selection logic to use interaction_kwargs.name
for sample-level binding
- Configuration Updates: Added name field support in interaction config
format with automatic name generation fallback

  Data Processing

- Updated GSM8K Preprocessing: Modified
examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name
field in interaction_kwargs
- Enhanced Configuration: Updated
examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml
with explicit name field

  Testing & Quality

- Comprehensive Test Suite: Added
tests/interactions/test_interaction_registry.py with full coverage of
registry functionality
- Integration Tests: Created
tests/workers/rollout/test_sglang_multi_interaction.py for
multi-interaction scenarios
- Updated Existing Tests: Modified existing interaction tests to support
new name attribute and configuration format
- Error Handling: Added validation for duplicate names, missing
interactions, and edge cases

  Backward Compatibility

- Graceful Degradation: When no interaction config is provided, system
works without interactions (empty interaction_map)
- Default Name Handling: Falls back to "gsm8k" when no name is specified
in interaction_kwargs
- Existing API Preservation: All existing interaction functionality
remains unchanged

 Key Features

1. Sample-Level Selection: Each sample can specify which interaction to
use via interaction_kwargs.name
2. Registry Pattern: Similar architecture to existing tools system for
consistency
3. Automatic Naming: Intelligent name generation from class names (e.g.,
Gsm8kInteraction → gsm8k)
  4. Duplicate Prevention: Runtime validation prevents naming conflicts
5. Flexible Configuration: Supports both explicit names and automatic
derivation
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
…rl-project#2184)

### What does this PR do?

> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.

This PR implements multi-interaction support in SGLangRollout, enabling
sample-level interaction selection similar to the existing tools system.
The implementation includes a new interaction registry system that
allows multiple named
interactions to be configured and used within a single rollout instance.
verl-project#1630

Core Implementation

- New Interaction Registry System: Created
verl/interactions/utils/interaction_registry.py with functions to
dynamically load and manage multiple interaction instances from
configuration files
  - Enhanced SGLangRollout:
- Replaced single interaction attribute with interaction_map: dict[str,
BaseInteraction]
- Updated _initialize_interactions() method to support multiple
interactions via registry
- Modified interaction selection logic to use interaction_kwargs.name
for sample-level binding
- Configuration Updates: Added name field support in interaction config
format with automatic name generation fallback

  Data Processing

- Updated GSM8K Preprocessing: Modified
examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name
field in interaction_kwargs
- Enhanced Configuration: Updated
examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml
with explicit name field

  Testing & Quality

- Comprehensive Test Suite: Added
tests/interactions/test_interaction_registry.py with full coverage of
registry functionality
- Integration Tests: Created
tests/workers/rollout/test_sglang_multi_interaction.py for
multi-interaction scenarios
- Updated Existing Tests: Modified existing interaction tests to support
new name attribute and configuration format
- Error Handling: Added validation for duplicate names, missing
interactions, and edge cases

  Backward Compatibility

- Graceful Degradation: When no interaction config is provided, system
works without interactions (empty interaction_map)
- Default Name Handling: Falls back to "gsm8k" when no name is specified
in interaction_kwargs
- Existing API Preservation: All existing interaction functionality
remains unchanged

 Key Features

1. Sample-Level Selection: Each sample can specify which interaction to
use via interaction_kwargs.name
2. Registry Pattern: Similar architecture to existing tools system for
consistency
3. Automatic Naming: Intelligent name generation from class names (e.g.,
Gsm8kInteraction → gsm8k)
  4. Duplicate Prevention: Runtime validation prevents naming conflicts
5. Flexible Configuration: Supports both explicit names and automatic
derivation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.