Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Hard Reasoning on HLE
Loading...
37.7
Pass@1
Gemini-3.0 Pro
11.492
18.296
25.1
31.904
Dec 2, 2025
Pass@1
Updated 3d ago
Evaluation Results
Method
Method
Links
Pass@1
Gemini-3.0 Pro
2025.12
37.7
GPT-5 High
2025.12
26.3
DeepSeek-V3.2
thinking mode=true, te...
2025.12
25.1
Kimi-K2
thinking mode=true
2025.12
23.9
DeepSeek-V3.2
thinking mode=true, te...
2025.12
23.9
Claude-4.5-Sonnet
2025.12
13.7
MiniMax M2
2025.12
12.5
Feedback
Search any
task
Search any
task