Bug Description
(EnvironmentWorker(train_env-0) pid=746905) 2026-03-19T17:41:53.502+08:00 INFO:rock_agent.py:382 [rock.sdk.sandbox.agent.rock_agent] [] [] -- [240498b70e464981a6cfc7165776dc31] Start to Initializing ModelService
(EnvironmentWorker(train_env-0) pid=746905) 2026-03-19T17:41:53.503+08:00 ERROR:rock_agent.py:224 [rock.sdk.sandbox.agent.rock_agent] [] [] -- [240498b70e464981a6cfc7165776dc31] Agent initialization failed - 'RuntimeEnvConfig' object has no attribute 'pip' (elapsed: 0.50s),
Steps to Reproduce
defaults:
- ../config/traj_envs@_here_
- ../config/deepspeed_zero@_here_
- ../config/deepspeed_zero2@_here_
- ../config/deepspeed_zero3@_here_
- ../config/deepspeed_zero3_cpuoffload@_here_
hydra:
run:
dir: .
output_subdir: null
exp_name: "agentic_rollout_swe"
seed: 42
logging_dir: ./output/logs
output_dir: ./output
model_name: ${exp_name}-${now:%Y%m%d_%H%M%S}
rollout_dump_dir: ./output/rollout_dump
system_envs:
USE_MODELSCOPE: '1'
num_gpus_per_node: 8
rpc_timeout: 72000
max_steps: 10
save_steps: 10
logging_steps: 1
eval_steps: 2
resume_from_checkpoint: false
rollout_batch_size: 1
val_batch_size: 1
sequence_length: 65536
max_tokens_per_step: 4096
advantage_clip: 0.2
ppo_epochs: 1
adv_estimator: "step_reinforce"
batch_adjust_mode: "random_sample"
step_reward_gamma: 1.0
#pg_clip: 0.1
#dual_clip_loss: True
init_kl_coef: 0.0
whiten_advantages: true
entropy_loss_coef: 0
max_grad_norm: 1.0
pretrain: /var/model/Qwen2.5-7B-Instruct
reward_pretrain: /var/model/Qwen2.5-7B-Instruct
actor_train:
model_args:
flash_attn: fa2
disable_gradient_checkpointing: false
dtype: bf16
model_type: ~
actor_infer:
model_args:
flash_attn: fa2
disable_gradient_checkpointing: true
dtype: bf16
generating_args:
max_new_tokens: ${max_tokens_per_step} # single-turn response length
top_p: 1.0
top_k: 50
num_beams: 1
temperature: 1.0
num_return_sequences: 1
stop_strings: ["</tool_call>","</tool_call>\n","\n</tool_call>\n","\n</function>"]
include_stop_str_in_output: true
data_args:
template: qwen3_coder
strategy_args:
strategy_name: vllm
strategy_config:
gpu_memory_utilization: 0.8
block_size: 16
load_format: auto
tensor_parallel_size: 1
device_mapping: list(range(1,2))
reward_normalization:
grouping: traj_group_id # tags(env_type)/traj_group_id(group)/batch(rollout_batch)... group_by reward/adv
method: mean
# norm_mean_type: batch
# norm_std_type: group
train_env_manager:
max_env_num_per_worker: 1
num_env_groups: 1
# under the same group, the env config and env seed are ensured to be equal
group_size: 1
tags: [swebench_native_verified]
num_groups_partition: [1] # If not set, all env names divide nums equally. Under the same group, the env config and env seed (prompt) are equal in each generation
system_envs:
# if you cannot get python env in rock due to connetion error, try to use this, may expire in the future
ROCK_RTENV_PYTHON_V31114_INSTALL_CMD: '[ -f cpython31115.tar.gz ] && rm cpython31115.tar.gz; [ -d python ] && rm -rf python; wget -q -O cpython31115.tar.gz https://mirror.nju.edu.cn/github-release/astral-sh/python-build-standalone/20260303/cpython-3.11.15+20260303-x86_64-unknown-linux-gnu-install_only.tar.gz && tar -xzf cpython31115.tar.gz && mv python runtime-env'
val_env_manager:
max_env_num_per_worker: 1
num_env_groups: 1
group_size: 1 # should be set to 1 because val temperature is set to 0 and same prompt leads to same output
tags: [swebench_native_verified]
num_groups_partition: [1] # TODO: If not set, all env names divide nums equally. Under the same group, the env config and env seed (prompt) are equal in each generation
system_envs:
# if you cannot get python env in rock due to connetion error, try to use this, may expire in the future
ROCK_RTENV_PYTHON_V31114_INSTALL_CMD: '[ -f cpython31115.tar.gz ] && rm cpython31115.tar.gz; [ -d python ] && rm -rf python; wget -q -O cpython31115.tar.gz https://mirror.nju.edu.cn/github-release/astral-sh/python-build-standalone/20260303/cpython-3.11.15+20260303-x86_64-unknown-linux-gnu-install_only.tar.gz && tar -xzf cpython31115.tar.gz && mv python runtime-env'
max_actions_per_traj: 60
env_manager_cls: roll.pipeline.agentic.env_manager.agent_native_env_manager.AgentNativeStepEnvManager
agent_config_common:
agent_type: swe-agent
model_service_config:
type: "local"
enabled: True
port: 8081
run_cmd: >
sweagent run
--problem_statement.type text
--problem_statement.text <<PROMPT>>
--env.repo.type local
--env.repo.path "$(pwd)"
--env.deployment.type local
--agent.tools.execution_timeout 1000
--agent.model.per_instance_cost_limit 0
--agent.model.total_cost_limit 0
--agent.model.name kimi/kimi-k2.5
--agent.model.api_base "http://localhost:8081/v1"
--agent.model.api_key sk-xxxx
# replace <MODEL_NAME> <BASE_URL> <API_KEY>; keep others as-is
# if repo at workspace root, change env.repo's config like: --env.repo.type preexisting --env.repo.repo_name testbed
# Python Agent often meets python conflicts, so we need to avoid add RuntimeEnv to PATH in agent_run
skip_wrap_run_cmd: true
runtime_env_config:
type: python
version: "3.12"
custom_install_cmd: git clone https://github.com/SWE-agent/SWE-agent.git && cd SWE-agent && pip install -e .
project_path: "/tmp/testbed"
# SWE-agent requires an existing repository, so we need to initialize one.
post_init_cmds:
- command: "mkdir -p /tmp/testbed && cd /tmp/testbed && git init && echo init >README.md && git add README.md && git -c user.email=test@example.com -c user.name='Test User' commit -m 'Initial commit'"
timeout_seconds: 30
custom_envs:
swebench_native_verified:
env_type: "rock_tb_native_env"
max_steps: ${max_actions_per_traj}
max_tokens_per_step: ${max_tokens_per_step}
env_manager_cls: ${env_manager_cls}
agent_system_template: "agent_system_template placeholder"
agent_template: "agent_template placeholder"
env_config:
dataset_name: /workspace/ROLL/data/swe_bench_verified_example.jsonl
tools: ~
max_steps: ${max_actions_per_traj}
mode: "val"
sandbox_base_url: http://rock:8080 # change to your own service address if needed
user_id: "xxx"
experiment_id: "test_tb_native"
test_files: ["/terminal-bench-datasets/datasets/swebench-verified"]
agent_config: ${agent_config_common}
Expected Behavior
Actual Behavior
Error Logs
Environment Information
- OS: [e.g. Ubuntu 22.04, macOS 13.0, Windows 11]
- Python Version: [e.g. 3.11.5]
- ROCK Version: [e.g. 0.2.0]
- Installation Method: [e.g. pip install rl-rock, source installation with uv]
- Docker Version: [e.g. 24.0.6] (if using sandbox features)
- Deployment Type: [e.g. local, distributed, ray]
ROCK Configuration
- Runtime Environment Type: [e.g. uv, pip, conda] uv
- Sandbox Image: [e.g. python:3.11, custom image] 3.12
- Resource Allocation: [e.g. memory=8g, cpus=2.0] Not set
I asked LLM, it gives me the following reason:
问题分析
在 RuntimeEnv.create() 方法(第 64 行)中:
python
runtime_env = runtime_class(sandbox=sandbox, runtime_env_config=runtime_env_config)
这里 runtime_class 是 PythonRuntimeEnv,但它的构造函数签名是:
python
def init(self, sandbox: Sandbox, runtime_env_config: PythonRuntimeEnvConfig) -> None:
而传入的 runtime_env_config 实际上是 RuntimeEnvConfig 基类对象(不是 PythonRuntimeEnvConfig)。
根本原因
当 RockAgentConfig 被创建时,runtime_env_config 字段的默认值是:
python
runtime_env_config: RuntimeEnvConfigType | None = Field(default_factory=PythonRuntimeEnvConfig)
但如果配置是从 YAML/字典加载的,Pydantic 可能会将字典解析为基类 RuntimeEnvConfig 而不是 PythonRuntimeEnvConfig。
And the solution:
def __init__(self, sandbox: Sandbox, runtime_env_config: PythonRuntimeEnvConfig) -> None:
# 确保是 PythonRuntimeEnvConfig 类型
if not isinstance(runtime_env_config, PythonRuntimeEnvConfig):
runtime_env_config = PythonRuntimeEnvConfig.model_validate(runtime_env_config.model_dump())
...
The problem is solved with the solution.
Component Affected
Bug Description
Steps to Reproduce
Expected Behavior
Actual Behavior
Error Logs
Environment Information
ROCK Configuration
I asked LLM, it gives me the following reason:
And the solution:
The problem is solved with the solution.
Component Affected