Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning & General on HLE
Loading...
51.8
Score
Kimi K2.5
23.72
31.01
38.3
45.59
Feb 17, 2026
Score
Full set Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Score
Full set Score
Kimi K2.5
Tool Use=w/ Tools
2026.02
51.8
-
GLM-5
Tool Use=w/ Tools
2026.02
50.4
-
GLM-4.7
Tool Use=w/ Tools
2026.02
42.8
-
DeepSeek-V3.2
Tool Use=w/ Tools
2026.02
40.8
-
Gemini 3 Pro
Tool Use=None
2026.02
37.2
-
GPT-5.2 (xhigh)
Tool Use=None
2026.02
35.4
-
Kimi K2.5
Tool Use=None
2026.02
31.5
-
GLM-5
Tool Use=None
2026.02
30.5
-
Claude Opus 4.5
Tool Use=None
2026.02
28.4
-
DeepSeek-V3.2
Tool Use=None
2026.02
25.1
-
GLM-4.7
Tool Use=None
2026.02
24.8
-
Claude Opus 4.5
Tool Use=w/ Tools, Eva...
2026.02
-
43.4
Gemini 3 Pro
Tool Use=w/ Tools, Eva...
2026.02
-
45.8
GPT-5.2 (xhigh)
Tool Use=w/ Tools, Eva...
2026.02
-
45.5
Feedback
Search any
task
Search any
task