| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Deep Research | xbench | Accuracy83 | 30 | |
| Deep Search | xbench DeepSearch (test) | Accuracy75 | 26 | |
| Deep-search QA | Xbench-DeepSearch (test) | Pass@175 | 24 | |
| Web Research | Xbench DeepSearch | Pass@164.6 | 18 | |
| Multi-turn tool use | Xbench | Pass@175.1 | 18 | |
| Information-Seeking | XBench 2505 (full) | pass@175 | 17 | |
| Deep Research | xbench-DS | Pass@171 | 15 | |
| Deep Research | XBench-DeepSearch original (test) | Pass@171 | 15 | |
| Web Search | xbench | Average Score66 | 15 | |
| Deep Search | xBench DeepSearch (05) | Score75 | 14 | |
| Deep Information Search and Synthesis | xbench DeepSearch | Score77.8 | 14 | |
| Expert-Level Reasoning | XBench-DeepSearch 1.0 (test) | Inference Accuracy0.9 | 12 | |
| Web Agent Search and Reasoning | xbench deepsearch | Accuracy73.3 | 11 | |
| Deep Search Reasoning | XBench DeepSearch2505 | Score41 | 9 | |
| Deep Search | xBench DeepSearch-10 | Score39 | 8 | |
| Agent Reasoning | xbench (test) | Pass@30.66 | 8 | |
| Search | XBench | Score45 | 7 | |
| Question Answering | xbench DeepSearch | Accuracy (Pass@4)56 | 4 | |
| Deep Research | Xbench DeepResearch | Accuracy46 | 4 | |
| Out-of-Distribution Evaluation | xBench-DS (OOD) | Avg@446 | 3 | |
| Information-Seeking | XBench v2510 (full) | Pass@145 | 2 | |
| Deep Search | xbench DeepSearch (leaderboard) | Metric- | 0 |