Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Problem Solving on AIME24
Loading...
54.1
Pass@1
Base Model
11.252
22.376
33.5
44.624
Apr 20, 2026
Pass@1
Pass@16
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@16
Base Model
Model Scale=Qwen3-4B,...
2026.04
54.1
75.5
MIXED-CUTS
Model Scale=Qwen3-4B,...
2026.04
46
73.5
GRPO
Model Scale=Qwen3-4B,...
2026.04
32.5
63.8
MIXED-CUTS
Model Scale=Qwen3-1.7B...
2026.04
32.3
62
GRPO
Model Scale=Qwen3-1.7B...
2026.04
29.5
60.7
Base Model
Model Scale=Qwen3-1.7B...
2026.04
28.9
62.8
Base Model
Model Scale=Qwen3-4B,...
2026.04
24.2
54
Base Model
Model Scale=Qwen3-1.7B...
2026.04
12.9
36.5
Feedback
Search any
task
Search any
task