Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
High-Level Reasoning on HLE
Loading...
26.6
Average Score
OpenAI DeepResearch
3.096
9.198
15.3
21.402
Jan 29, 2026
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Score
OpenAI DeepResearch
2026.01
26.6
OpenAI-o3
2026.01
20.2
Claude-4-Sonnet
2026.01
20.2
o1-preview
2026.01
11.1
Search-o1
Backbone=QwQ-32B-Preview
2026.01
10.8
Reagent-U
Backbone=Qwen3-8B
2026.01
10.8
ARPO
Backbone=Qwen3-14B
2026.01
10
Atom-Searcher
Backbone=Qwen2.5-7B
2026.01
10
Reagent-R
Backbone=Qwen3-8B
2026.01
10
ARPO
Backbone=Qwen3-8B
2026.01
8.8
DeepSeek-R1-671B
Backbone=DeepSeek-R1-671B
2026.01
8.6
VerlTool
Backbone=Qwen3-8B
2026.01
8.4
Reagent w/o Agent-RRM
Backbone=Qwen3-8B
2026.01
6.8
WebThinker
Backbone=Qwen3-8B
2026.01
6.6
QwQ-32B
Backbone=QwQ-32B
2026.01
6.4
Reagent-C
Backbone=Qwen3-8B, Inf...
2026.01
4.6
Qwen3-8B
Backbone=Qwen3-8B
2026.01
4
Feedback
Search any
task
Search any
task