Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

xbench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Deep Researchxbench
Accuracy83
30
Deep Searchxbench DeepSearch (test)
Accuracy75
26
Deep-search QAXbench-DeepSearch (test)
Pass@175
24
Deep Information Search and Synthesisxbench DeepSearch
Score77.8
22
Deep SearchxBench DeepSearch DS-2505
Score82
20
Web ResearchXbench DeepSearch
Pass@164.6
18
Multi-turn tool useXbench
Pass@175.1
18
Information-SeekingXBench 2505 (full)
pass@175
17
Deep Researchxbench-DS
Pass@171
15
Deep ResearchXBench-DeepSearch original (test)
Pass@171
15
Web Searchxbench
Average Score66
15
Deep SearchxBench DeepSearch (05)
Score75
14
Deep ResearchXbench DeepResearch
Accuracy67
14
General Deep Research Tool UseXbench DeepSearch
Success Rate76
12
Expert-Level ReasoningXBench-DeepSearch 1.0 (test)
Inference Accuracy0.9
12
Web Agent Search and Reasoningxbench deepsearch
Accuracy73.3
11
Agentic Web Interactionxbench DeepSearch 2510 (test)
Pass@166
10
SearchXBench
Score74
9
Deep Search ReasoningXBench DeepSearch2505
Score41
9
Deep SearchxBench DeepSearch-10
Score39
8
Agent Reasoningxbench (test)
Pass@30.66
8
Deep SearchXbench DeepSearch
Score81
7
Calibration PerformancexBench DeepSearch
NECE0.34
7
Deep Search and Information Retrievalxbench DeepSearch 2510
Avg@875
7
Question Answeringxbench DeepSearch
Accuracy (Pass@4)56
4
Showing 25 of 28 rows