Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Update preprocess_search_r1_dataset.py
To adapt to the tool call format of Verl's AgentLoop, modify the prompt as follows:
  • Loading branch information
NLPJCL authored Jan 29, 2026
commit 7ad89859e1eb74e3e4e62eb4205e32032f838503
19 changes: 16 additions & 3 deletions examples/data_preprocess/preprocess_search_r1_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,22 @@

# Configuration constants
DEFAULT_SYSTEM_CONTENT = "You are a helpful and harmless assistant."
# DEFAULT_USER_CONTENT_PREFIX = (
# "Answer the given question. You must conduct reasoning inside <think> and </think> "
# "first every time you get new information. After reasoning, if you find you lack "
# "some knowledge, you can call a search engine by <tool_call> query </tool_call> "
# "and it will return the top searched results between <tool_response> and "
# "</tool_response>. You can search as many times as your want. If you find no "
# "further external knowledge needed, you can directly provide the answer inside "
# "<answer> and </answer>, without detailed illustrations. For example, "
# "<answer> Beijing </answer>. Question: "
# )
#To adapt to the tool call format of Verl's AgentLoop, modify the prompt as follows:
DEFAULT_USER_CONTENT_PREFIX = (
"Answer the given question. You must conduct reasoning inside <think> and </think> "
"first every time you get new information. After reasoning, if you find you lack "
"some knowledge, you can call a search engine by <tool_call> query </tool_call> "
"Answer the given question. You must conduct step by step reasoning "
"first every time you get new information. If you need external information, "
"call the search tool by returning a JSON object inside <tool_call> tags. "
"For example: <tool_call>{\"name\": \"search\", \"arguments\": {\"query\": \"your query\"}}</tool_call>. "
"and it will return the top searched results between <tool_response> and "
"</tool_response>. You can search as many times as your want. If you find no "
"further external knowledge needed, you can directly provide the answer inside "
Expand Down Expand Up @@ -176,3 +188,4 @@ def apply_process_row(row, split_name=split):
user_content_prefix = DEFAULT_USER_CONTENT_PREFIX

main()