| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Deep Research | xbench | Accuracy83 | 30 | |
| Deep Search | xbench DeepSearch (test) | Accuracy75 | 26 | |
| Deep-search QA | Xbench-DeepSearch (test) | Pass@175 | 24 | |
| Deep Information Search and Synthesis | xbench DeepSearch | Score77.8 | 22 | |
| Deep Search | xBench DeepSearch DS-2505 | Score82 | 20 | |
| Web Research | Xbench DeepSearch | Pass@164.6 | 18 | |
| Multi-turn tool use | Xbench | Pass@175.1 | 18 | |
| Information-Seeking | XBench 2505 (full) | pass@175 | 17 | |
| Deep Research | xbench-DS | Pass@171 | 15 | |
| Deep Research | XBench-DeepSearch original (test) | Pass@171 | 15 | |
| Web Search | xbench | Average Score66 | 15 | |
| Deep Search | xBench DeepSearch (05) | Score75 | 14 | |
| Deep Research | Xbench DeepResearch | Accuracy67 | 14 | |
| General Deep Research Tool Use | Xbench DeepSearch | Success Rate76 | 12 | |
| Expert-Level Reasoning | XBench-DeepSearch 1.0 (test) | Inference Accuracy0.9 | 12 | |
| Web Agent Search and Reasoning | xbench deepsearch | Accuracy73.3 | 11 | |
| Agentic Web Interaction | xbench DeepSearch 2510 (test) | Pass@166 | 10 | |
| Search | XBench | Score74 | 9 | |
| Deep Search Reasoning | XBench DeepSearch2505 | Score41 | 9 | |
| Deep Search | xBench DeepSearch-10 | Score39 | 8 | |
| Agent Reasoning | xbench (test) | Pass@30.66 | 8 | |
| Deep Search | Xbench DeepSearch | Score81 | 7 | |
| Calibration Performance | xBench DeepSearch | NECE0.34 | 7 | |
| Deep Search and Information Retrieval | xbench DeepSearch 2510 | Avg@875 | 7 | |
| Question Answering | xbench DeepSearch | Accuracy (Pass@4)56 | 4 |