Open Deep Research Training

This tutorial demonstrates how to train your own deep research agent using GRPO to exceed Sonnet 4's performance. Specifically, you will be using the ART library to specialize Qwen 2.5 14B for Langchain's open deep research framework, and will evaluate your agent's performance using DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents. In addition to the GRPO training step, you will also run an initial SFT training run to improve the model's baseline performance.

The chart below shows the accuracy of a Qwen 2.5 14B Instruct model (the same model you will be training) as it learns to perform deep research, eventually exceeding the performance of GPT-4.1 and Sonnet 4. With any luck, your model will be able to do the same!

Getting Started

1. Install dependencies

If you haven't already, install uv by following the instructions here.

Then install the project dependencies by running uv sync.

2. Install SkyPilot/RunPod

We'll be using LocalBackend to manage the GPU that your model will be trained on. In order to provision a GPU for your training run, you'll need to have SkyPilot installed on your machine and provide it with the credentials to spin up machines on at least one infra provider.

We recommend using RunPod because of their ease of use, but any infra provider that SkyPilot supports will work.

Follow RunPod's Getting Started guide here. You'll have to provide a credit card to use RunPod, but you'll only pay for the time your GPUs are running.

3. Set up optional environment variables found in `.env.example`.

Copy .env.example to .env at the root of the repository, and fill in the values for the environment variables. If you're unsure about any of the values, refer to ENV_INSTRUCTIONS.md.

4. Run the training scripts

You'll want to run these scripts in this order:

uv run collect_sft.py # Collect samples for your sft training run. ~1 Hour
uv run run_sft.py # Run your sft training run. ~1 Hour
uv run run_train.py # Run your rl training run. >1 Day

5. Generate the benchmarks

Run the benchmark script in the evaluate folder with the models you want to benchmark:

uv run evaluate/benchmark_model.py

Then run the evaluate/display_benchmarks.ipynb notebook to display the results.

Modifications

We modified the DeepResearch Bench repo to add a new run_single_race_bench.py which allows you to run a single benchmark at a time, needed for running RL runs.

We modified the Open Deep Research repo to change the search to use Tavily's advanced search answering to enable training models with smaller context windows.

Acknowledgements

Huge thanks to the LangChain and Tavily teams for collaborating on this project and providing the services that the agent is built on. Additionally, we greatly appreciate the overall support, feedback, and adoption that ART has received from the open source community.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
charts		charts
deep_research_bench		deep_research_bench
evaluate		evaluate
examples		examples
licenses		licenses
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONFIGURING_AWS.md		CONFIGURING_AWS.md
ENV_INSTRUCTIONS.md		ENV_INSTRUCTIONS.md
LICENSE		LICENSE
README-ORIGINAL.md		README-ORIGINAL.md
README.md		README.md
THIRD-PARTY-NOTICES		THIRD-PARTY-NOTICES
collect_sft.py		collect_sft.py
langgraph.json		langgraph.json
pyproject.toml		pyproject.toml
rollout.py		rollout.py
run_sft.py		run_sft.py
run_train.py		run_train.py
sft.py		sft.py
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Open Deep Research Training

Getting Started

1. Install dependencies

2. Install SkyPilot/RunPod

3. Set up optional environment variables found in `.env.example`.

4. Run the training scripts

5. Generate the benchmarks

Modifications

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

OpenPipe/open_deep_research_training

Folders and files

Latest commit

History

Repository files navigation

Open Deep Research Training

Getting Started

1. Install dependencies

2. Install SkyPilot/RunPod

3. Set up optional environment variables found in .env.example.

4. Run the training scripts

5. Generate the benchmarks

Modifications

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

3. Set up optional environment variables found in `.env.example`.

Packages