Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

xbench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Deep Researchxbench
Accuracy83
30
Deep Searchxbench DeepSearch (test)
Accuracy75
26
Web Task ReasoningXBench (test)
Pass@180.8
25
Deep SearchXBench DeepSearch
Accuracy73
24
Deep-search QAXbench-DeepSearch (test)
Pass@175
24
Deep ResearchxBench-DS-2505
Score82
22
Deep Information Search and Synthesisxbench DeepSearch
Score77.8
22
Deep SearchxBench DeepSearch DS-2505
Score82
20
Search Agent EvaluationXBench
Average Score78
18
Agentic SearchXbench DeepSearch 2505
Accuracy78
18
Web ResearchXbench DeepSearch
Pass@164.6
18
Multi-turn tool useXbench
Pass@175.1
18
Information-SeekingXBench 2505 (full)
pass@175
17
Deep Searchxbench-DS
Accuracy75
16
Deep Information Retrieval and Researchxbench DeepSearch
Avg@877.8
16
Deep Researchxbench-DS
Pass@171
15
Deep ResearchXBench-DeepSearch original (test)
Pass@171
15
Web Searchxbench
Average Score66
15
Agentic Searchxbench DeepSearch
Accuracy61
14
Deep SearchxBench DeepSearch (05)
Score75
14
Deep ResearchXbench DeepResearch
Accuracy67
14
General Deep Research Tool UseXbench DeepSearch
Success Rate76
12
Expert-Level ReasoningXBench-DeepSearch 1.0 (test)
Inference Accuracy0.9
12
Deep ResearchxBench DS 2510
Score75
11
Web Agent Search and Reasoningxbench deepsearch
Accuracy73.3
11
Showing 25 of 43 rows