You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -148,12 +148,14 @@
148
148
149
149
## 📰 News
150
150
151
-
🎉 **[2025-10-28] DeepCode Achieves State-of-the-Art Performance on PaperBench Code-Dev!**
151
+
🎉 **[2025-10] 🎉 [2025-10-28] DeepCode Achieves SOTA on PaperBench!**
152
152
153
-
- 🏆 **Surpasses Human Experts**: DeepCode achieves **75.9%** on the 3-paper subset, outperforming **Top ML PhD** (72.4%) by **+3.5%**
154
-
- 🥇 **Outperforms Commercial Agents**: **+26.1%** improvement over best commercial code agents (**Cursor, Claude Code, Codex**) with **84.8%** accuracy
155
-
- 🔬 **Advances Scientific Code Generation**: **+22.4%** improvement over PaperCoder, the previous SOTA scientific code agent
156
-
- 🚀 **Beats LLM-Based Agents**: **+30.2%** improvement over best LLM agent frameworks, demonstrating the power of sophisticated agent architecture
153
+
DeepCode sets new benchmarks on OpenAI's PaperBench Code-Dev across all categories:
154
+
155
+
- 🏆 **Surpasses Human Experts**: **75.9%** (DeepCode) vs Top Machine Learning PhDs 72.4% (+3.5%).
156
+
- 🥇 **Outperforms SOTA Commercial Code Agents**: **84.8%** (DeepCode) vs learning commercial code agents (+26.1%) (Cursor, Claude Code, and Codex).
0 commit comments