Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning & General on HLE Full
Loading...
0.502
Score (%)
Kimi K2.5
0.24096
0.30873
0.3765
0.44427
Feb 2, 2026
Score (%)
Updated 4d ago
Evaluation Results
Method
Method
Links
Score (%)
Kimi K2.5
Evaluation Protocol=w/...
2026.02
0.502
Gemini 3 Pro
Evaluation Protocol=w/...
2026.02
0.458
GPT-5.2 (xhigh)
Evaluation Protocol=w/...
2026.02
0.455
Claude Opus 4.5
Evaluation Protocol=w/...
2026.02
0.432
DeepSeek-V3.2
Evaluation Protocol=w/...
2026.02
0.408
Gemini 3 Pro
Evaluation Protocol=st...
2026.02
0.375
GPT-5.2 (xhigh)
Evaluation Protocol=st...
2026.02
0.345
Claude Opus 4.5
Evaluation Protocol=st...
2026.02
0.308
Kimi K2.5
Evaluation Protocol=st...
2026.02
0.301
DeepSeek-V3.2
Evaluation Protocol=st...
2026.02
0.251
Feedback
Search any
task
Search any
task