[sglang] feat: Support async multi-turn rollout with simulation feedback in sglang#1630
Conversation
|
nice work! |
|
We will take a look these days, stay tuned in 24h |
|
@kinza99 my wechat is 18015766633. Feel free to discuss, thanks! |
2d46b4e to
1629248
Compare
| # "release_kwargs": {}, | ||
| }, | ||
| }, | ||
| "feedback_kwargs": { |
There was a problem hiding this comment.
It might be better to rename it to "interaction_kwargs".
docs/sglang_multiturn/multiturn.rst
Outdated
|
|
||
| actor_rollout_ref: | ||
| rollout: | ||
| feedback_config_file: <path_to_feedback_yaml_file> |
There was a problem hiding this comment.
interaction_config_file
verl/interactions/base.py
Outdated
| Simulates: get id + state init | ||
| """ | ||
| # ...implement the logic to get ID and initialize state... | ||
| interaction_id = "some_unique_id" |
There was a problem hiding this comment.
It might be better support uuid by default like https://github.com/volcengine/verl/blob/main/verl/tools/base_tool.py#L50.
verl/interactions/base.py
Outdated
| interaction_id = "some_unique_id" | ||
| return interaction_id | ||
|
|
||
| async def generate_response(self, messages: Any) -> Tuple[bool, str, float, Dict[str, Any]]: # More clear response generation method |
There was a problem hiding this comment.
Sorry for missing instance_id in doc, it would be better keep instance id logic in tool to track session.
|
|
||
| reward = await self.calculate_score(instance_id) | ||
| if reward == 1.0: | ||
| feedback = "Your response is correct!" |
| _req.state = AsyncRolloutRequestStateEnum.WAITING | ||
| else: | ||
| break | ||
| elif _req.state == AsyncRolloutRequestStateEnum.WAITING: |
There was a problem hiding this comment.
WAITING is a bad state name, is there any other concrete state name for interacting with Interaction cls?
| break | ||
| else: | ||
| _req.add_user_message(self.tokenizer, content, format=self.config.multi_turn.format) | ||
| if len(_req.input_ids) >= self.config.max_model_len: |
There was a problem hiding this comment.
Need add unit test for new added interaction related logic.
1629248 to
c0176c1
Compare
| response_length = micro_batch["responses"].size(-1) | ||
| multi_modal_inputs = {} | ||
| if "multi_modal_inputs" in micro_batch: | ||
| if "multi_modal_inputs" in micro_batch.keys(): |
There was a problem hiding this comment.
Just curious why adding lot .keys() in this PR? For ... in dict_like_obj the statements are equivalent
There was a problem hiding this comment.
In some versions, you may encounter the problem NotImplementedError: TensorDict does not support membership checks with the in keyword. If you want to check if a particular key is in your TensorDict, please use key in tensordict.keys() instead.
| """ | ||
| # ...implement the logic to calculate turn-level score... | ||
| score = 0.0 | ||
| return score |
There was a problem hiding this comment.
How about raise NotImplementedError
There was a problem hiding this comment.
How about
raise NotImplementedError
The calc_score method is not essential for implementing interactions. Here, we use 0.0 to ensure it has no impact on the aggregated reward.
| import psutil | ||
| import torch | ||
| import torch.distributed | ||
| import torch.distributed as dist |
There was a problem hiding this comment.
Duplicate torch.distributed
| @@ -0,0 +1,14 @@ | |||
| # Copyright 2023-2024 SGLang Team | |||
| # Copyright 2025 ModelBest Inc. and/or its affiliates | |||
| # | |||
There was a problem hiding this comment.
pls add the copyright of bytedance
|
To-do:
|
|
There seems to be lots of changes. I'll take some time to review as well |
| from verl.interactions.base import BaseInteraction | ||
| from typing import Dict, Any, List, Tuple, Optional | ||
|
|
||
| class BaseInteraction: |
There was a problem hiding this comment.
Is this just an OpenAI GymEnv?
There was a problem hiding this comment.
I guess env or tool can both support in current tools submodule. Interaction is designed to 2 sceneraios:
- partial exposure
- inject context with workflow as user role (as many framework do)
So BaseInteraction support generate response from model api our other workflow part as endpoint. In partial exposure user requirements, typically need message history to dynamicly add user requirements. In workflow context injection, it typically need messages parsed to other node input and inject other node output back to current node. And we can use e2e outcome to train current node model (If not in current model, need to extend in finalize).
| else: | ||
| return instance_id | ||
|
|
||
| async def generate_response(self, instance_id: str, messages: List[Dict[str, Any]], **kwargs) -> Tuple[bool, str, float, Dict[str, Any]]: # More clear response generation method |
There was a problem hiding this comment.
P2: use a pydantic class as return instead of Tuple
There was a problem hiding this comment.
Will switch to pydantic cls in future.
| self._sgl_tools, | ||
| self._function_call_parser, | ||
| ) = self._initialize_tools(config, tokenizer) | ||
| self.interaction: dict[str, BaseInteraction] = self._intitalize_interaction(config) |
There was a problem hiding this comment.
typo: self.interaction is not a dict
There was a problem hiding this comment.
It could be fixed in _initialize_interaction, which should support init multiple interactions in same config file. I guess we coulld merge 1630 first and add multiple interactions in future. Current impl already support single task training with interaction.
| elif _req.state == AsyncRolloutRequestStateEnum.INTERACTING: | ||
| user_turns += 1 | ||
| messages = [{"role": x.role, "content": x.content} for x in _req.messages] | ||
| should_terminate_sequence, content, reward, metrics = await self.interaction.generate_response(_req.request_id, messages, **_req.interaction_kwargs) |
There was a problem hiding this comment.
Do we support choose different interaction based on assistant response? Or we just use one interaction, and dispatch in it?
There was a problem hiding this comment.
We support different interaction cls, according to sample level config, like tools. For message history difference,should be handled inside interaction.
| # add index for each prompt | ||
| index = row_dict.get("extra_info", {}).get("index", 0) | ||
| tools_kwargs = row_dict.get("extra_info", {}).get("tools_kwargs", {}) | ||
| interaction_kwargs = row_dict.get("extra_info", {}).get("interaction_kwargs", {}) |
There was a problem hiding this comment.
Could you add a config to control whether to include the tool related fields in dataproto? This may increase the size if tools are not used
There was a problem hiding this comment.
#2145 Tracked by new issue, will refactor in future.
) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This PR implements multi-interaction support in SGLangRollout, enabling sample-level interaction selection similar to the existing tools system. The implementation includes a new interaction registry system that allows multiple named interactions to be configured and used within a single rollout instance. #1630 Core Implementation - New Interaction Registry System: Created verl/interactions/utils/interaction_registry.py with functions to dynamically load and manage multiple interaction instances from configuration files - Enhanced SGLangRollout: - Replaced single interaction attribute with interaction_map: dict[str, BaseInteraction] - Updated _initialize_interactions() method to support multiple interactions via registry - Modified interaction selection logic to use interaction_kwargs.name for sample-level binding - Configuration Updates: Added name field support in interaction config format with automatic name generation fallback Data Processing - Updated GSM8K Preprocessing: Modified examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name field in interaction_kwargs - Enhanced Configuration: Updated examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml with explicit name field Testing & Quality - Comprehensive Test Suite: Added tests/interactions/test_interaction_registry.py with full coverage of registry functionality - Integration Tests: Created tests/workers/rollout/test_sglang_multi_interaction.py for multi-interaction scenarios - Updated Existing Tests: Modified existing interaction tests to support new name attribute and configuration format - Error Handling: Added validation for duplicate names, missing interactions, and edge cases Backward Compatibility - Graceful Degradation: When no interaction config is provided, system works without interactions (empty interaction_map) - Default Name Handling: Falls back to "gsm8k" when no name is specified in interaction_kwargs - Existing API Preservation: All existing interaction functionality remains unchanged Key Features 1. Sample-Level Selection: Each sample can specify which interaction to use via interaction_kwargs.name 2. Registry Pattern: Similar architecture to existing tools system for consistency 3. Automatic Naming: Intelligent name generation from class names (e.g., Gsm8kInteraction → gsm8k) 4. Duplicate Prevention: Runtime validation prevents naming conflicts 5. Flexible Configuration: Supports both explicit names and automatic derivation
…rl-project#2184) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This PR implements multi-interaction support in SGLangRollout, enabling sample-level interaction selection similar to the existing tools system. The implementation includes a new interaction registry system that allows multiple named interactions to be configured and used within a single rollout instance. verl-project#1630 Core Implementation - New Interaction Registry System: Created verl/interactions/utils/interaction_registry.py with functions to dynamically load and manage multiple interaction instances from configuration files - Enhanced SGLangRollout: - Replaced single interaction attribute with interaction_map: dict[str, BaseInteraction] - Updated _initialize_interactions() method to support multiple interactions via registry - Modified interaction selection logic to use interaction_kwargs.name for sample-level binding - Configuration Updates: Added name field support in interaction config format with automatic name generation fallback Data Processing - Updated GSM8K Preprocessing: Modified examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name field in interaction_kwargs - Enhanced Configuration: Updated examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml with explicit name field Testing & Quality - Comprehensive Test Suite: Added tests/interactions/test_interaction_registry.py with full coverage of registry functionality - Integration Tests: Created tests/workers/rollout/test_sglang_multi_interaction.py for multi-interaction scenarios - Updated Existing Tests: Modified existing interaction tests to support new name attribute and configuration format - Error Handling: Added validation for duplicate names, missing interactions, and edge cases Backward Compatibility - Graceful Degradation: When no interaction config is provided, system works without interactions (empty interaction_map) - Default Name Handling: Falls back to "gsm8k" when no name is specified in interaction_kwargs - Existing API Preservation: All existing interaction functionality remains unchanged Key Features 1. Sample-Level Selection: Each sample can specify which interaction to use via interaction_kwargs.name 2. Registry Pattern: Similar architecture to existing tools system for consistency 3. Automatic Naming: Intelligent name generation from class names (e.g., Gsm8kInteraction → gsm8k) 4. Duplicate Prevention: Runtime validation prevents naming conflicts 5. Flexible Configuration: Supports both explicit names and automatic derivation
…rl-project#2184) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This PR implements multi-interaction support in SGLangRollout, enabling sample-level interaction selection similar to the existing tools system. The implementation includes a new interaction registry system that allows multiple named interactions to be configured and used within a single rollout instance. verl-project#1630 Core Implementation - New Interaction Registry System: Created verl/interactions/utils/interaction_registry.py with functions to dynamically load and manage multiple interaction instances from configuration files - Enhanced SGLangRollout: - Replaced single interaction attribute with interaction_map: dict[str, BaseInteraction] - Updated _initialize_interactions() method to support multiple interactions via registry - Modified interaction selection logic to use interaction_kwargs.name for sample-level binding - Configuration Updates: Added name field support in interaction config format with automatic name generation fallback Data Processing - Updated GSM8K Preprocessing: Modified examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name field in interaction_kwargs - Enhanced Configuration: Updated examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml with explicit name field Testing & Quality - Comprehensive Test Suite: Added tests/interactions/test_interaction_registry.py with full coverage of registry functionality - Integration Tests: Created tests/workers/rollout/test_sglang_multi_interaction.py for multi-interaction scenarios - Updated Existing Tests: Modified existing interaction tests to support new name attribute and configuration format - Error Handling: Added validation for duplicate names, missing interactions, and edge cases Backward Compatibility - Graceful Degradation: When no interaction config is provided, system works without interactions (empty interaction_map) - Default Name Handling: Falls back to "gsm8k" when no name is specified in interaction_kwargs - Existing API Preservation: All existing interaction functionality remains unchanged Key Features 1. Sample-Level Selection: Each sample can specify which interaction to use via interaction_kwargs.name 2. Registry Pattern: Similar architecture to existing tools system for consistency 3. Automatic Naming: Intelligent name generation from class names (e.g., Gsm8kInteraction → gsm8k) 4. Duplicate Prevention: Runtime validation prevents naming conflicts 5. Flexible Configuration: Supports both explicit names and automatic derivation
…rl-project#2184) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This PR implements multi-interaction support in SGLangRollout, enabling sample-level interaction selection similar to the existing tools system. The implementation includes a new interaction registry system that allows multiple named interactions to be configured and used within a single rollout instance. verl-project#1630 Core Implementation - New Interaction Registry System: Created verl/interactions/utils/interaction_registry.py with functions to dynamically load and manage multiple interaction instances from configuration files - Enhanced SGLangRollout: - Replaced single interaction attribute with interaction_map: dict[str, BaseInteraction] - Updated _initialize_interactions() method to support multiple interactions via registry - Modified interaction selection logic to use interaction_kwargs.name for sample-level binding - Configuration Updates: Added name field support in interaction config format with automatic name generation fallback Data Processing - Updated GSM8K Preprocessing: Modified examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name field in interaction_kwargs - Enhanced Configuration: Updated examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml with explicit name field Testing & Quality - Comprehensive Test Suite: Added tests/interactions/test_interaction_registry.py with full coverage of registry functionality - Integration Tests: Created tests/workers/rollout/test_sglang_multi_interaction.py for multi-interaction scenarios - Updated Existing Tests: Modified existing interaction tests to support new name attribute and configuration format - Error Handling: Added validation for duplicate names, missing interactions, and edge cases Backward Compatibility - Graceful Degradation: When no interaction config is provided, system works without interactions (empty interaction_map) - Default Name Handling: Falls back to "gsm8k" when no name is specified in interaction_kwargs - Existing API Preservation: All existing interaction functionality remains unchanged Key Features 1. Sample-Level Selection: Each sample can specify which interaction to use via interaction_kwargs.name 2. Registry Pattern: Similar architecture to existing tools system for consistency 3. Automatic Naming: Intelligent name generation from class names (e.g., Gsm8kInteraction → gsm8k) 4. Duplicate Prevention: Runtime validation prevents naming conflicts 5. Flexible Configuration: Supports both explicit names and automatic derivation
…rl-project#2184) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This PR implements multi-interaction support in SGLangRollout, enabling sample-level interaction selection similar to the existing tools system. The implementation includes a new interaction registry system that allows multiple named interactions to be configured and used within a single rollout instance. verl-project#1630 Core Implementation - New Interaction Registry System: Created verl/interactions/utils/interaction_registry.py with functions to dynamically load and manage multiple interaction instances from configuration files - Enhanced SGLangRollout: - Replaced single interaction attribute with interaction_map: dict[str, BaseInteraction] - Updated _initialize_interactions() method to support multiple interactions via registry - Modified interaction selection logic to use interaction_kwargs.name for sample-level binding - Configuration Updates: Added name field support in interaction config format with automatic name generation fallback Data Processing - Updated GSM8K Preprocessing: Modified examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name field in interaction_kwargs - Enhanced Configuration: Updated examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml with explicit name field Testing & Quality - Comprehensive Test Suite: Added tests/interactions/test_interaction_registry.py with full coverage of registry functionality - Integration Tests: Created tests/workers/rollout/test_sglang_multi_interaction.py for multi-interaction scenarios - Updated Existing Tests: Modified existing interaction tests to support new name attribute and configuration format - Error Handling: Added validation for duplicate names, missing interactions, and edge cases Backward Compatibility - Graceful Degradation: When no interaction config is provided, system works without interactions (empty interaction_map) - Default Name Handling: Falls back to "gsm8k" when no name is specified in interaction_kwargs - Existing API Preservation: All existing interaction functionality remains unchanged Key Features 1. Sample-Level Selection: Each sample can specify which interaction to use via interaction_kwargs.name 2. Registry Pattern: Similar architecture to existing tools system for consistency 3. Automatic Naming: Intelligent name generation from class names (e.g., Gsm8kInteraction → gsm8k) 4. Duplicate Prevention: Runtime validation prevents naming conflicts 5. Flexible Configuration: Supports both explicit names and automatic derivation
…rl-project#2184) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This PR implements multi-interaction support in SGLangRollout, enabling sample-level interaction selection similar to the existing tools system. The implementation includes a new interaction registry system that allows multiple named interactions to be configured and used within a single rollout instance. verl-project#1630 Core Implementation - New Interaction Registry System: Created verl/interactions/utils/interaction_registry.py with functions to dynamically load and manage multiple interaction instances from configuration files - Enhanced SGLangRollout: - Replaced single interaction attribute with interaction_map: dict[str, BaseInteraction] - Updated _initialize_interactions() method to support multiple interactions via registry - Modified interaction selection logic to use interaction_kwargs.name for sample-level binding - Configuration Updates: Added name field support in interaction config format with automatic name generation fallback Data Processing - Updated GSM8K Preprocessing: Modified examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name field in interaction_kwargs - Enhanced Configuration: Updated examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml with explicit name field Testing & Quality - Comprehensive Test Suite: Added tests/interactions/test_interaction_registry.py with full coverage of registry functionality - Integration Tests: Created tests/workers/rollout/test_sglang_multi_interaction.py for multi-interaction scenarios - Updated Existing Tests: Modified existing interaction tests to support new name attribute and configuration format - Error Handling: Added validation for duplicate names, missing interactions, and edge cases Backward Compatibility - Graceful Degradation: When no interaction config is provided, system works without interactions (empty interaction_map) - Default Name Handling: Falls back to "gsm8k" when no name is specified in interaction_kwargs - Existing API Preservation: All existing interaction functionality remains unchanged Key Features 1. Sample-Level Selection: Each sample can specify which interaction to use via interaction_kwargs.name 2. Registry Pattern: Similar architecture to existing tools system for consistency 3. Automatic Naming: Intelligent name generation from class names (e.g., Gsm8kInteraction → gsm8k) 4. Duplicate Prevention: Runtime validation prevents naming conflicts 5. Flexible Configuration: Supports both explicit names and automatic derivation
…rl-project#2184) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This PR implements multi-interaction support in SGLangRollout, enabling sample-level interaction selection similar to the existing tools system. The implementation includes a new interaction registry system that allows multiple named interactions to be configured and used within a single rollout instance. verl-project#1630 Core Implementation - New Interaction Registry System: Created verl/interactions/utils/interaction_registry.py with functions to dynamically load and manage multiple interaction instances from configuration files - Enhanced SGLangRollout: - Replaced single interaction attribute with interaction_map: dict[str, BaseInteraction] - Updated _initialize_interactions() method to support multiple interactions via registry - Modified interaction selection logic to use interaction_kwargs.name for sample-level binding - Configuration Updates: Added name field support in interaction config format with automatic name generation fallback Data Processing - Updated GSM8K Preprocessing: Modified examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name field in interaction_kwargs - Enhanced Configuration: Updated examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml with explicit name field Testing & Quality - Comprehensive Test Suite: Added tests/interactions/test_interaction_registry.py with full coverage of registry functionality - Integration Tests: Created tests/workers/rollout/test_sglang_multi_interaction.py for multi-interaction scenarios - Updated Existing Tests: Modified existing interaction tests to support new name attribute and configuration format - Error Handling: Added validation for duplicate names, missing interactions, and edge cases Backward Compatibility - Graceful Degradation: When no interaction config is provided, system works without interactions (empty interaction_map) - Default Name Handling: Falls back to "gsm8k" when no name is specified in interaction_kwargs - Existing API Preservation: All existing interaction functionality remains unchanged Key Features 1. Sample-Level Selection: Each sample can specify which interaction to use via interaction_kwargs.name 2. Registry Pattern: Similar architecture to existing tools system for consistency 3. Automatic Naming: Intelligent name generation from class names (e.g., Gsm8kInteraction → gsm8k) 4. Duplicate Prevention: Runtime validation prevents naming conflicts 5. Flexible Configuration: Supports both explicit names and automatic derivation
…rl-project#2184) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This PR implements multi-interaction support in SGLangRollout, enabling sample-level interaction selection similar to the existing tools system. The implementation includes a new interaction registry system that allows multiple named interactions to be configured and used within a single rollout instance. verl-project#1630 Core Implementation - New Interaction Registry System: Created verl/interactions/utils/interaction_registry.py with functions to dynamically load and manage multiple interaction instances from configuration files - Enhanced SGLangRollout: - Replaced single interaction attribute with interaction_map: dict[str, BaseInteraction] - Updated _initialize_interactions() method to support multiple interactions via registry - Modified interaction selection logic to use interaction_kwargs.name for sample-level binding - Configuration Updates: Added name field support in interaction config format with automatic name generation fallback Data Processing - Updated GSM8K Preprocessing: Modified examples/data_preprocess/gsm8k_multiturn_w_interaction.py to inject name field in interaction_kwargs - Enhanced Configuration: Updated examples/sglang_multiturn/config/interaction_config/gsm8k_interaction_config.yaml with explicit name field Testing & Quality - Comprehensive Test Suite: Added tests/interactions/test_interaction_registry.py with full coverage of registry functionality - Integration Tests: Created tests/workers/rollout/test_sglang_multi_interaction.py for multi-interaction scenarios - Updated Existing Tests: Modified existing interaction tests to support new name attribute and configuration format - Error Handling: Added validation for duplicate names, missing interactions, and edge cases Backward Compatibility - Graceful Degradation: When no interaction config is provided, system works without interactions (empty interaction_map) - Default Name Handling: Falls back to "gsm8k" when no name is specified in interaction_kwargs - Existing API Preservation: All existing interaction functionality remains unchanged Key Features 1. Sample-Level Selection: Each sample can specify which interaction to use via interaction_kwargs.name 2. Registry Pattern: Similar architecture to existing tools system for consistency 3. Automatic Naming: Intelligent name generation from class names (e.g., Gsm8kInteraction → gsm8k) 4. Duplicate Prevention: Runtime validation prevents naming conflicts 5. Flexible Configuration: Supports both explicit names and automatic derivation


Checklist Before Starting
fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, datafeat, fix, refactor, chore, test,or space, like[megatron, fsdp, doc] feat: xxxWhat does this PR do?
Test
High-Level Design
Specific Changes
API
Usage Example
Contributor List
Checklist Before Submitting
[BREAKING]to the PR titledescriptionif it breaks any API.