Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on MATH500 (Avg@4)
Loading...
97.5
Accuracy (Avg@4)
GRPO
80.236
84.718
89.2
93.682
Dec 1, 2025
Dec 17, 2025
Jan 3, 2026
Jan 20, 2026
Feb 5, 2026
Feb 22, 2026
Mar 11, 2026
Accuracy (Avg@4)
Token Count
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy (Avg@4)
Token Count
GRPO
Backbone=Qwen3-8B
2025.12
97.5
-
RePro
Backbone=Qwen3-8B, Bas...
2025.12
97.2
-
Original
Backbone=Qwen3-8B
2025.12
96.8
-
GR³
Category=Performance-o...
2026.03
89.3
2,214
DLER-R1-1.5B
Category=Length-orient...
2026.03
87.2
1,783
GRPO
Category=Performance-o...
2026.03
85.6
7,138
Laser-DE-L4096-1.5B
Category=Length-orient...
2026.03
84.6
1,931
DeepSeek-R1-Distill-1.5B
Category=Initial model
2026.03
83.9
5,399
LCR1-1.5B
Category=Length-orient...
2026.03
81.9
2,520
AdaptThink-1.5B
Category=Length-orient...
2026.03
80.9
1,649
Feedback
Search any
task
Search any
task