Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Math Problem Solving on GSM8K (test)
Loading...
90.98
Accuracy
Eurus-2-7B-PRIME
58.9688
67.2794
75.59
83.9006
May 16, 2025
Accuracy
Average Response Length
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Average Response Length
Eurus-2-7B-PRIME
Model Scale=7B
2025.05
90.98
302.72
SelfBudgeter
Model Scale=7B
2025.05
90.3
991.13
DeepSeek-R1-Distill-Qwen
Model Scale=7B
2025.05
87.09
1,918.21
SelfBudgeter
Model Scale=1.5B
2025.05
84.1
1,231.79
L1-Max
Specified Limit=3600 t...
2025.05
79.56
571.72
Qwen-2.5-7B-Simple-RL
Model Scale=7B
2025.05
75.94
519.07
DeepSeek-R1-Distill-Qwen
Model Scale=1.5B
2025.05
73.09
2,865.08
E1-Math-1.5B
Truncation Budget=4K, 1K
2025.05
72.1
1,299.62
E1-Math-1.5B
Truncation Budget=0.5K...
2025.05
60.2
1,205.21
Feedback
Search any
task
Search any
task