Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Problem Solving on AIME 25 (Pass@1, Pass@16)
Loading...
42.1
Pass@1
Base Model
10.484
18.692
26.9
35.108
Apr 20, 2026
Pass@1
Pass@16
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@16
Base Model
Model Scale=Qwen3-4B,...
2026.04
42.1
62.3
MIXED-CUTS
Model Scale=Qwen3-4B,...
2026.04
41.7
71.9
MIXED-CUTS
Model Scale=Qwen3-1.7B...
2026.04
28.1
52.5
GRPO
Model Scale=Qwen3-4B,...
2026.04
26.6
57.9
Base Model
Model Scale=Qwen3-1.7B...
2026.04
24.9
44.5
GRPO
Model Scale=Qwen3-1.7B...
2026.04
22.8
44.5
Base Model
Model Scale=Qwen3-4B,...
2026.04
21.5
48
Base Model
Model Scale=Qwen3-1.7B...
2026.04
11.7
27.8
Feedback
Search any
task
Search any
task