Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Math Problem Solving on AIME 2025 (test)
Loading...
30
Accuracy
SelfBudgeter
3.4176
10.3188
17.22
24.1212
May 16, 2025
Accuracy
Average Response Length
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Average Response Length
SelfBudgeter
Model Scale=7B
2025.05
30
12,241.84
DeepSeek-R1-Distill-Qwen
Model Scale=7B
2025.05
28.89
22,158.79
DeepSeek-R1-Distill-Qwen
Model Scale=1.5B
2025.05
22.22
14,444.03
E1-Math-1.5B
Truncation Budget=4K, 1K
2025.05
21.11
5,578.13
SelfBudgeter
Model Scale=1.5B
2025.05
21.11
4,288.1
L1-Max
Specified Limit=3600 t...
2025.05
17.88
5,213.89
Eurus-2-7B-PRIME
Model Scale=7B
2025.05
15.56
1,254.52
Qwen-2.5-7B-Simple-RL
Model Scale=7B
2025.05
6.67
1,429.94
E1-Math-1.5B
Truncation Budget=0.5K...
2025.05
4.44
3,008.44
Feedback
Search any
task
Search any
task