Our new X account is live! Follow @wizwand_team for updates
Search any
task
Feedback
Search any
task
SOTA Deep Research benchmarks and papers with code | Wizwand
Our new X account is live! Follow @wizwand_team for updates
Home
/
Tasks
Deep Research
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
BrowseComp-ZH (BC-zh) original (test)
OpenAI-o3
Pass@1
58.1
45
4d ago
xbench
RE-TRAC-30B-A3B
Accuracy
83
30
4d ago
DeepResearch Bench official 100-task-subset 1.0
OAgents-DR
RACE Overall
0.5076
24
4d ago
DeepResearch Bench
DualGraph
RACE Overall
53.08
22
4d ago
BrowseComp
Kimi-K2.5-Agent
Score
74.9
21
4d ago
BrowseComp-EN (BC-en) original (test)
OpenAI-o3
Pass@1
49.7
20
4d ago
GAIA text-only original (test)
WebSailor-v2-30B-A3B (RL)
Pass@1
74.1
20
4d ago
BrowseComp+
Qwen3-235B (w/ Pensieve)
Accuracy
55.33
19
4d ago
BrowseComp-zh
GLM-4.7-358B
Accuracy
66.6
18
4d ago
BrowseComp-zh
Seed1.8
BrowseComp-zh Score
81.3
16
4d ago
GAIA
THINKMERGE
Pass@1
51.46
16
4d ago
HLE
Kimi-K2-Thinking-1T
Accuracy
51
16
4d ago
xbench-DS
DeepSeek-V3.1
Pass@1
71
15
4d ago
BrowseComp-ZH
OpenAI-o3
Pass@1
58.1
15
4d ago
BrowseComp
OpenAI-o3
Pass@1
50.9
15
4d ago
GAIA
OpenAI-o3
Pass@1
70.5
15
4d ago
XBench-DeepSearch original (test)
DeepSeek-V3.1
Pass@1
71
15
4d ago
GAIA
RE-TRAC-30B-A3B
Accuracy
78.2
14
4d ago
WebWalkerQA original (test)
Tongyi-DeepResearch
Pass@1
72.2
14
4d ago
FRAMES
TOOLSELF
Accuracy
56
14
4d ago
HLE text-only original (test)
Tongyi-DeepResearch
Pass@1
32.9
13
4d ago
DeepResearch Bench 1.0 (test)
OpenAI-DeepResearch
Overall Score
46.45
12
4d ago
ADR-Bench Finance & Law
Gemini DeepResearch
Score Range
2,535
9
4d ago
BC (test)
Our best agent
Mean Correct Answer Rate
620
8
4d ago
GAIA (test)
Our best agent
Mean Correct Rate
49.2
8
4d ago
Showing 25 of 40 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Terms of Service
FAQs