Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

xbench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Deep Researchxbench
Accuracy83
30
Deep Searchxbench DeepSearch (test)
Accuracy75
26
Deep-search QAXbench-DeepSearch (test)
Pass@175
24
Web ResearchXbench DeepSearch
Pass@164.6
18
Multi-turn tool useXbench
Pass@175.1
18
Information-SeekingXBench 2505 (full)
pass@175
17
Deep Researchxbench-DS
Pass@171
15
Deep ResearchXBench-DeepSearch original (test)
Pass@171
15
Web Searchxbench
Average Score66
15
Deep SearchxBench DeepSearch (05)
Score75
14
Deep Information Search and Synthesisxbench DeepSearch
Score77.8
14
Expert-Level ReasoningXBench-DeepSearch 1.0 (test)
Inference Accuracy0.9
12
Web Agent Search and Reasoningxbench deepsearch
Accuracy73.3
11
Deep Search ReasoningXBench DeepSearch2505
Score41
9
Deep SearchxBench DeepSearch-10
Score39
8
Agent Reasoningxbench (test)
Pass@30.66
8
SearchXBench
Score45
7
Question Answeringxbench DeepSearch
Accuracy (Pass@4)56
4
Deep ResearchXbench DeepResearch
Accuracy46
4
Out-of-Distribution EvaluationxBench-DS (OOD)
Avg@446
3
Information-SeekingXBench v2510 (full)
Pass@145
2
Deep Searchxbench DeepSearch (leaderboard)
Metric-
0
Showing 22 of 22 rows