Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Deep Research on WebWalkerQA original (test)
Loading...
72.2
Pass@1
Tongyi-DeepResearch
32.784
43.017
53.25
63.483
Jan 26, 2026
Pass@1
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1
Tongyi-DeepResearch
Backbone Group=Medium...
2026.01
72.2
OpenAI-o3
Backbone Group=Large S...
2026.01
71.7
Kimi-K2
Backbone Group=Large S...
2026.01
63
Claude-4-Sonnet
Backbone Group=Large S...
2026.01
61.7
OffSeeker-8B (DPO)
Backbone Group=Small S...
2026.01
61.7
DeepSeek-V3.1
Backbone Group=Large S...
2026.01
61.2
OffSeeker-8B (SFT)
Backbone Group=Small S...
2026.01
60
WebExplorer-8B (RL)
Backbone Group=Small S...
2026.01
58
WebShaper-72B
Backbone Group=Medium...
2026.01
52.2
WebShaper-32B
Backbone Group=Medium...
2026.01
51.4
MiroThinker-32B-DPO-v0.1
Backbone Group=Medium...
2026.01
49.3
WebDancer-QwQ
Backbone Group=Medium...
2026.01
47.9
MiroThinker-8B-DPO-v0.1
Backbone Group=Small S...
2026.01
45.7
ASearcher-Web-QwQ
Backbone Group=Medium...
2026.01
34.3
Feedback
Search any
task
Search any
task