Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AIME 2025 (pass@16)
Loading...
85
Pass@16
Ministral 3
20.7592
37.4371
54.115
70.7929
Sep 27, 2025
Oct 15, 2025
Nov 2, 2025
Nov 20, 2025
Dec 8, 2025
Dec 26, 2025
Jan 13, 2026
Pass@16
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@16
Ministral 3
Model Size=14B
2026.01
85
Qwen3-VL
Model Size=8B
2026.01
79.8
Ministral 3
Model Size=8B
2026.01
78.7
Qwen 3
Model Size=14B
2026.01
73.7
Ministral 3
Model Size=3B
2026.01
72.1
Qwen3-VL
Model Size=4B
2026.01
69.7
GEB-1/π
Backbone=Qwen2.5-7B
2025.09
29.48
GEB-arctanhπ
Backbone=Qwen2.5-7B
2025.09
29.38
GEB-π
Backbone=Qwen2.5-7B
2025.09
28.23
DPO
Backbone=Qwen2.5-7B
2025.09
23.23
Feedback
Search any
task
Search any
task