Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Problem Solving and Unsolvability Detection on AIME 24-25
Loading...
95
Solvable Accuracy
Gemini-3
33.016
49.108
65.2
81.292
Dec 1, 2025
Solvable Accuracy
Unsolvable Detection Rate
Mean Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Solvable Accuracy
Unsolvable Detection Rate
Mean Score
Gemini-3
Model Scale=3
2025.12
95
21.2
58.1
Deepseek-V3.2-R
Model Scale=V3.2-R
2025.12
85
40.3
62.7
Qwen3-4B + UnsolvableRL
Model Scale=4B, Traini...
2025.12
69.6
40.4
55
Qwen3-4B Instruct
Model Scale=4B, Traini...
2025.12
67.9
14.8
41.4
GPT-5.1-Low
Model Scale=5.1-Low
2025.12
61.7
42.5
52.1
Qwen3-1.7B Instruct
Model Scale=1.7B, Trai...
2025.12
38.3
21.2
29.8
Qwen3-1.7B + UnsolvableRL
Model Scale=1.7B, Trai...
2025.12
35.4
28.7
32.1
Feedback
Search any
task
Search any
task