Our code implementation is mostly based on MARTI framework.
uv venv marllm --python 3.9 && source marllm/bin/activate && uv pip install --upgrade pip
uv pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
uv pip install vllm==0.8.5.post1
uv pip install setuptools && uv pip install flash_attn==2.7.4.post1 --no-build-isolation
cd MARLLM
uv pip install -r requirements.txtFollow the setup instructions for dependencies, including OpenRLHF, Ray, and vLLM.
MARTI supports:
- Built-in DAG-based workflows: debate, mixture-of-agents, chain-of-agents
- Third-party frameworks: AutoGen and CAMEL (Experimental)
Example:
MODEL_DIR="Path to models, like Qwen2.5-3B"
# See the script for more inference examples
bash scripts/run_test_mas.sh ${MODEL_DIR}MARTI supports:
- Rule-based rewards (Reward Shaping)
- Generative reward models (LLM-as-Judge) (Experimental)
- Tree-based AgentPRM (ImplicitPRM) (Experimental)
- Supervised fine-tuning + RL (e.g., PPO, GRPO)
Example:
# Minimum hardware requirement for training with 3 Qwen2.5-3B agents: approximately 6Γ80G GPUs
MODEL_DIR="Path to models, like Qwen2.5-3B"
WANDB_KEY="API key of wandb"
# Train Single Agent with GRPO
bash scripts/run_train_grpo.sh ${MODEL_DIR} ${WANDB_KEY}
# Train Multi-Agent Debate with Reinforce++
bash scripts/run_train_mad.sh ${MODEL_DIR} ${WANDB_KEY}We introduce asynchronous tool use and workflow support for both single-agent and multi-agent RL pipelines. These features make our framework more modular, efficient, and scalable for a variety of RL scenarios.
Single Agent Tool Use
- Modular Steps (
marti/worlds/steps): Each agent's actions are now organized in step files (e.g.,xxx_step.py), making it easy to customize and extend for new tasks. - Expanded Toolset (
marti/worlds/tools): Our agents now have access to a broader range of tools for agentic decision-making, enabling richer interactions and problem-solving capabilities.
# Multi-turn Code RL
bash scripts/run_train_grpo_code.sh
# Multi-turn Search RL
bash scripts/run_train_grpo_search.shNote: You can refer to PeterGriffinJin/Search-R1 and bytedance/SandboxFusion separately to set up search and code tool services.
Multi-Agent Workflow
- Workflow Orchestration (
marti/worlds/workflows): We now support orchestrating complex multi-agent environments via modular workflow files (e.g.,xxx_workflow.py). This allows coordinated interactions between multiple agents in a flexible and easily configurable manner. - Advanced Processors (
marti/worlds/workflows): Integrated processors (e.g.,xxx_processor.py) support advanced reward shaping and custom feedback loops, empowering more sophisticated learning dynamics and agent cooperation/competition.
# Chain-of-agents (MathChat)
bash scripts/run_train_mathchat_async.sh
# Multi-agent Debate
bash scripts/run_train_mad_async.shThese improvements open up new possibilities for research and deployment in both single-agent and multi-agent RL settings. As always, we're keen for your feedback and contributions!
We employ the MARTI framework to train both base and reasoning models, specifically Qwen2.5-3B and DeepScaleR-1.5B-Preview. For Qwen2.5-3B, we implement DeepSeek-R1 zero-like reinforcement learning training using Level 3-5 samples from the MATH dataset. The DeepScaleR-1.5B-Preview model, which exhibits strong inherent reasoning capabilities but presents training challenges, undergoes Test-Time Reinforcement Learning (TTRL) adaptation on AIME benchmark data. For multi-agent reinforcement learning, we employ a cluster configuration consisting of 3 nodes, each equipped with 8 A800 80GB GPUs, allocating one full node per agent.
We compare non-reasoning and reasoning models under various configurations and show that majority voting consistently outperforms multi-agent workflows when trained conventionally. This reflects known limitations of current LLM-based agent systems, such as poor role adherence and ineffective inter-agent communication.
To address this, MARTI enhances model reasoning through structured agent interactions. As shown in Figure 2 and Figure 3, our experiments show that:
- MARTI-trained base models outperform standard RL setups and rival instructed models.
- Large reasoning models trained with MARTI using TTRL achieve state-of-the-art results on challenging tasks (e.g., 66.7 AIME score with Multi-Agent Debates).
- Multi-agent RL consistently surpasses single-agent systems in performance under the same compute budget.
Figure 2: Average scores of Qwen2.5-3B base and instruct models under different budget and settings
Figure 3: Average scores of reasoning models under different budget and settings
We conduct multi-agent debate training with Qwen2.5-3B The Qwen2.5-3B model is trained using REINFORCE++ on Level 3 to 5 samples from the MATH-500 dataset.
Figure 4: Accuracy of MAD (Qwen2.5-3B, MATH) on AMC and MATH
Figure 5: Training Dynamics of MAD (Qwen2.5-3B, MATH)
We evaluate a mixture-of-agents approach using the Qwen2.5-3B model, trained on Levels 3 through 5 of the MATH-500 training dataset.
Figure 6: Accuracy of MoA (Qwen2.5-3B, MATH) on AMC and MATH
Figure 7: Training Dynamics of MoA (Qwen2.5-3B, MATH)
- Release MARTI Technical Report
- Initial support for agentic tasks (e.g., GAIA benchmark)
- More features are working in progress
MARTI is developed primarily based on OpenRLHF. We would like to express our gratitude to the developers of OpenRLHF, as well as to the teams behind vLLM, Ray and DeepSpeed for their invaluable contributions.
- Project Lead: Kaiyan Zhang
- Agent Group: Runze Liu, Kaiyan Zhang, Kai Tian, Guoli Jia, Xingtai Lv, Che Jiang
- RL Group: Kaiyan Zhang, Xuekai Zhu, Sihang Zeng, Yuchen Fan, Yuxin Zuo
For the full list of contributors, please refer to the author list in the citation. We are also deeply grateful to everyone who engaged in discussions and provided valuable feedback throughout the development of this project.
For issues or inquiries:
- Kaiyan Zhang, Tsinghua University (zhang-ky22@mails.tsinghua.edu.cn)
- Biqing Qi, Shanghai AI Lab (qibiqing@pjlab.org.cn)
If you use MARTI in your research, please cite the project:
@misc{marti2025,
title={MARTI: A Framework for Multi-Agent LLM Systems Reinforced Training and Inference},
author={Kaiyan Zhang and Runze Liu and Xuekai Zhu and Kai Tian and Sihang Zeng and Guoli Jia and Yuchen Fan and Xingtai Lv and Yuxin Zuo and Che Jiang and Ziyang Liu and Jianyu Wang and Yuru Wang and Ruotong Zhao and Ermo Hua and Yibo Wang and Shijie Wang and Junqi Gao and Xinwei Long and Youbang Sun and Zhiyuan Ma and Ganqu Cui and Lei Bai and Ning Ding and Biqing Qi and Bowen Zhou},
year={2025},
institution={Tsinghua University and Shanghai AI Lab},
url={https://github.com/TsinghuaC3I/MARTI}
}







