| BrowseComp-ZH (BC-zh) original (test) | | Pass@158.1 | | 45 | 1mo ago |
| BrowseComp+ | Qwen3-235B (w/ Pensieve) | Accuracy55.33 | | 38 | 24d ago |
| BrowseComp | OpenAI-o3 | Pass@150.9 | | 33 | 22d ago |
| xbench | RE-TRAC-30B-A3B | Accuracy83 | | 30 | 1mo ago |
| DeepResearch Bench official 100-task-subset 1.0 | OAgents-DR | RACE Overall0.5076 | | 24 | 1mo ago |
| GAIA | RE-TRAC-30B-A3B | Accuracy78.2 | | 24 | 25d ago |
| DeepResearch Bench | DualGraph | RACE Overall53.08 | | 22 | 1mo ago |
| BrowseComp | Kimi-K2.5-Agent | Score74.9 | | 21 | 1mo ago |
| BrowseComp-EN (BC-en) original (test) | | Pass@149.7 | | 20 | 1mo ago |
| GAIA text-only original (test) | WebSailor-v2-30B-A3B (RL) | Pass@174.1 | | 20 | 1mo ago |
| BrowseComp-zh | | Accuracy66.6 | | 18 | 1mo ago |
| KDR-Bench | | Average Score50.2 | | 17 | 8d ago |
| BrowseComp-zh | | BrowseComp-zh Score81.3 | | 16 | 1mo ago |
| GAIA | THINKMERGE | Pass@151.46 | | 16 | 1mo ago |
| HLE | | Accuracy51 | | 16 | 1mo ago |
| xbench-DS | DeepSeek-V3.1 | Pass@171 | | 15 | 1mo ago |
| BrowseComp-ZH | OpenAI-o3 | Pass@158.1 | | 15 | 1mo ago |
| GAIA | OpenAI-o3 | Pass@170.5 | | 15 | 1mo ago |
| XBench-DeepSearch original (test) | DeepSeek-V3.1 | Pass@171 | | 15 | 1mo ago |
| WebWalkerQA original (test) | Tongyi-DeepResearch | Pass@172.2 | | 14 | 1mo ago |
| Xbench DeepResearch | | Accuracy67 | | 14 | 25d ago |
| FRAMES | TOOLSELF | Accuracy56 | | 14 | 1mo ago |
| HLE text-only original (test) | Tongyi-DeepResearch | Pass@132.9 | | 13 | 1mo ago |
| DeepResearch Bench 1.0 (test) | | Overall Score46.45 | | 12 | 1mo ago |
| ScholarQA CS | GEPA custom | Average Score70.5 | | 10 | 12d ago |