Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AIME 2025 (Pass@k and Voting Metrics)
Loading...
95.4
Pass@1
Qwen3-235B-A22B-Thinking-2507
36.848
52.049
67.25
82.451
Dec 11, 2025
Pass@1
Majority Voting@8
Best-of-8
Verifier Voting@8
Pass@8
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1
Majority Voting@8
Best-of-8
Verifier Voting@8
Pass@8
Qwen3-235B-A22B-Thinking-2507
Policy=Qwen3-235B-A22B...
2025.12
95.4
96.7
97.9
98.3
100
gpt-oss-120b
Policy=gpt-oss-120b
2025.12
92.3
93.3
95.1
96.7
96.7
DeepSeek-R1-0528
Policy=DeepSeek-R1-0528
2025.12
87.1
88.3
87.5
90.8
96.7
QwQ-32B
Policy=QwQ-32B
2025.12
70
78.3
76.2
80
83.3
DeepSeek-R1-Distill-Qwen-32B
Policy=DeepSeek-R1-Dis...
2025.12
55.9
63.8
66.6
68
76.7
DeepSeek-R1-Distill-Llama-70B
Policy=DeepSeek-R1-Dis...
2025.12
44
50
56.1
57.8
70
DeepSeek-R1-Distill-Qwen-7B
Policy=DeepSeek-R1-Dis...
2025.12
39.1
49.2
56.3
55.4
66.7
Feedback
Search any
task
Search any
task