Skip to content

OpenPipe/open_deep_research_training

Repository files navigation

Open Deep Research Training

This tutorial demonstrates how to train your own deep research agent using GRPO to exceed Sonnet 4's performance. Specifically, you will be using the ART library to specialize Qwen 2.5 14B for Langchain's open deep research framework, and will evaluate your agent's performance using DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents. In addition to the GRPO training step, you will also run an initial SFT training run to improve the model's baseline performance.

The chart below shows the accuracy of a Qwen 2.5 14B Instruct model (the same model you will be training) as it learns to perform deep research, eventually exceeding the performance of GPT-4.1 and Sonnet 4. With any luck, your model will be able to do the same!

Getting Started

1. Install dependencies

If you haven't already, install uv by following the instructions here.

Then install the project dependencies by running uv sync.

2. Install SkyPilot/RunPod

We'll be using LocalBackend to manage the GPU that your model will be trained on. In order to provision a GPU for your training run, you'll need to have SkyPilot installed on your machine and provide it with the credentials to spin up machines on at least one infra provider.

We recommend using RunPod because of their ease of use, but any infra provider that SkyPilot supports will work.

Follow RunPod's Getting Started guide here. You'll have to provide a credit card to use RunPod, but you'll only pay for the time your GPUs are running.

3. Set up optional environment variables found in .env.example.

Copy .env.example to .env at the root of the repository, and fill in the values for the environment variables. If you're unsure about any of the values, refer to ENV_INSTRUCTIONS.md.

4. Run the training scripts

You'll want to run these scripts in this order:

uv run collect_sft.py # Collect samples for your sft training run. ~1 Hour
uv run run_sft.py # Run your sft training run. ~1 Hour
uv run run_train.py # Run your rl training run. >1 Day

5. Generate the benchmarks

Run the benchmark script in the evaluate folder with the models you want to benchmark:

uv run evaluate/benchmark_model.py

Then run the evaluate/display_benchmarks.ipynb notebook to display the results.

Modifications

We modified the DeepResearch Bench repo to add a new run_single_race_bench.py which allows you to run a single benchmark at a time, needed for running RL runs.

We modified the Open Deep Research repo to change the search to use Tavily's advanced search answering to enable training models with smaller context windows.

Acknowledgements

Huge thanks to the LangChain and Tavily teams for collaborating on this project and providing the services that the agent is built on. Additionally, we greatly appreciate the overall support, feedback, and adoption that ART has received from the open source community.

About

Training setup for Langchain's Open Deep Research

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •