Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on MATH500 (Avg@4, #Token)
Loading...
94
Avg@4 Score
GR3
88.904
90.227
91.55
92.873
Mar 11, 2026
Avg@4 Score
Output Length (#Token)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Avg@4 Score
Output Length (#Token)
GR3
Model size=7B
2026.03
94
1,764
DLER–R1–7B
Model size=7B
2026.03
93.2
1,650
Laser–DE–L4096–7B
Model size=7B
2026.03
92.4
1,580
DeepSeek–R1–Distill–7B
Model size=7B
2026.03
92.1
3,994
GRPO
Model size=7B
2026.03
92.1
5,006
AdaptThink–7B
Model size=7B
2026.03
90.6
2,011
LCR1–7B
Model size=7B
2026.03
89.1
1,546
Feedback
Search any
task
Search any
task