| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Deep Research | xbench | Accuracy83 | 30 | |
| Deep Search | xbench DeepSearch (test) | Accuracy75 | 26 | |
| Web Task Reasoning | XBench (test) | Pass@180.8 | 25 | |
| Deep Search | XBench DeepSearch | Accuracy73 | 24 | |
| Deep-search QA | Xbench-DeepSearch (test) | Pass@175 | 24 | |
| Deep Research | xBench-DS-2505 | Score82 | 22 | |
| Deep Information Search and Synthesis | xbench DeepSearch | Score77.8 | 22 | |
| Deep Search | xBench DeepSearch DS-2505 | Score82 | 20 | |
| Search Agent Evaluation | XBench | Average Score78 | 18 | |
| Agentic Search | Xbench DeepSearch 2505 | Accuracy78 | 18 | |
| Web Research | Xbench DeepSearch | Pass@164.6 | 18 | |
| Multi-turn tool use | Xbench | Pass@175.1 | 18 | |
| Information-Seeking | XBench 2505 (full) | pass@175 | 17 | |
| Deep Search | xbench-DS | Accuracy75 | 16 | |
| Deep Information Retrieval and Research | xbench DeepSearch | Avg@877.8 | 16 | |
| Deep Research | xbench-DS | Pass@171 | 15 | |
| Deep Research | XBench-DeepSearch original (test) | Pass@171 | 15 | |
| Web Search | xbench | Average Score66 | 15 | |
| Agentic Search | xbench DeepSearch | Accuracy61 | 14 | |
| Deep Search | xBench DeepSearch (05) | Score75 | 14 | |
| Deep Research | Xbench DeepResearch | Accuracy67 | 14 | |
| General Deep Research Tool Use | Xbench DeepSearch | Success Rate76 | 12 | |
| Expert-Level Reasoning | XBench-DeepSearch 1.0 (test) | Inference Accuracy0.9 | 12 | |
| Deep Research | xBench DS 2510 | Score75 | 11 | |
| Web Agent Search and Reasoning | xbench deepsearch | Accuracy73.3 | 11 |