Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Deep Research on HLE text-only original (test)
Loading...
32.9
Pass@1
Tongyi-DeepResearch
10.852
16.576
22.3
28.024
Jan 26, 2026
Pass@1
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1
Tongyi-DeepResearch
Backbone Group=Medium...
2026.01
32.9
WebSailor-v2-30B-A3B (RL)
Backbone Group=Medium...
2026.01
30.6
DeepSeek-V3.1
Backbone Group=Large S...
2026.01
29.8
DeepSeek-V3.2
Backbone Group=Large S...
2026.01
27.2
WebSailor-v2-30B-A3B (SFT)
Backbone Group=Medium...
2026.01
23.9
Claude-4-Sonnet
Backbone Group=Large S...
2026.01
20.3
OpenAI-o3
Backbone Group=Large S...
2026.01
20.2
Kimi-K2
Backbone Group=Large S...
2026.01
18.1
OffSeeker-8B (DPO)
Backbone Group=Small S...
2026.01
13.8
ASearcher-Web-QwQ
Backbone Group=Medium...
2026.01
12.5
WebExplorer-8B (RL)
Backbone Group=Small S...
2026.01
12.4
MiroThinker-32B-DPO-v0.1
Backbone Group=Medium...
2026.01
11.8
OffSeeker-8B (SFT)
Backbone Group=Small S...
2026.01
11.7
Feedback
Search any
task
Search any
task