Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Deep research on HLE
Loading...
51
Accuracy
Kimi-K2-Thinking-1T
15.952
25.051
34.15
43.249
Feb 2, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Kimi-K2-Thinking-1T
Model Category=Large O...
2026.02
51
GLM-4.7-358B
Model Category=Large O...
2026.02
42.8
GPT-5-high
Model Category=Closed-...
2026.02
42
DeepSeek-V3.2-Thinking-685B
Model Category=Large O...
2026.02
40.8
Gemini-3-pro
Model Category=Closed-...
2026.02
38.3
Tongyi-DeepResearch-30B-A3B
Model Category=Interme...
2026.02
32.9
Claude-4.5-Sonnet
Model Category=Closed-...
2026.02
32
MiniMax-M2-229B
Model Category=Large O...
2026.02
31.8
RE-TRAC-30B-A3B
Model Category=Interme...
2026.02
31.5
WebSailor-V2-30B-A3B (RL)
Model Category=Interme...
2026.02
30.6
IterResearch-30B-A3B
Model Category=Interme...
2026.02
28.8
OpenAI DeepResearch
Model Category=Closed-...
2026.02
26.6
o3
Model Category=Closed-...
2026.02
24.9
RE-TRAC-4B
Model Category=Compact...
2026.02
22.2
AgentCPM-Explore-4B
Model Category=Compact...
2026.02
19.1
WebExplorer-8B
Model Category=Compact...
2026.02
17.3
Feedback
Search any
task
Search any
task