| BrowseComp | Kimi-K2.5-Agent | Score74.9 | | 47 | 1mo ago |
| BrowseComp-ZH (BC-zh) original (test) | | Pass@158.1 | | 45 | 3mo ago |
| BrowseComp+ | Qwen3-235B (w/ Pensieve) | Accuracy55.33 | | 38 | 2mo ago |
| BrowseComp | OpenAI-o3 | Pass@150.9 | | 33 | 2mo ago |
| DeepResearch Bench | AgentDisCo | RACE Overall54.02 | | 31 | 2d ago |
| xbench | RE-TRAC-30B-A3B | Accuracy83 | | 30 | 3mo ago |
| DeepResearch Bench official 100-task-subset 1.0 | OAgents-DR | RACE Overall0.5076 | | 24 | 3mo ago |
| GAIA | RE-TRAC-30B-A3B | Accuracy78.2 | | 24 | 2mo ago |
| xBench-DS-2505 | | Score82 | | 22 | 1mo ago |
| ResearchQA | | Score79.2 | | 21 | 16d ago |
| BrowseComp-EN (BC-en) original (test) | | Pass@149.7 | | 20 | 3mo ago |
| GAIA text-only original (test) | WebSailor-v2-30B-A3B (RL) | Pass@174.1 | | 20 | 3mo ago |
| SQA v2 | DR Tulu-8B (RL) | Score88.3 | | 18 | 16d ago |
| BrowseComp-zh | | Accuracy66.6 | | 18 | 3mo ago |
| HealthBench | | Score59.5 | | 17 | 16d ago |
| GAIA Text-Only | REDSearcher-30B-A3B | Score80.1 | | 17 | 1mo ago |
| KDR-Bench | | Average Score50.2 | | 17 | 1mo ago |
| BrowseComp-zh | | BrowseComp-zh Score81.3 | | 16 | 3mo ago |
| GAIA | THINKMERGE | Pass@151.46 | | 16 | 3mo ago |
| HLE | | Accuracy51 | | 16 | 3mo ago |
| xbench-DS | DeepSeek-V3.1 | Pass@171 | | 15 | 3mo ago |
| BrowseComp-ZH | OpenAI-o3 | Pass@158.1 | | 15 | 3mo ago |
| GAIA | OpenAI-o3 | Pass@170.5 | | 15 | 3mo ago |
| XBench-DeepSearch original (test) | DeepSeek-V3.1 | Pass@171 | | 15 | 3mo ago |
| DeepResearch Bench (test) | VeriTrace | Comprehensiveness56.28 | | 14 | 7d ago |