Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning on MATH (Accuracy, Latency, Speedup)
Loading...
92.84
Accuracy (%)
LR
48.3488
59.8994
71.45
83.0006
Mar 13, 2026
Mar 22, 2026
Mar 31, 2026
Apr 10, 2026
Apr 19, 2026
Apr 28, 2026
May 8, 2026
Accuracy (%)
Average Length
Speedup
Updated 24d ago
Evaluation Results
Method
Method
Links
Accuracy (%)
Average Length
Speedup
LR
Target Model=Qwen / Qw...
2026.03
92.84
9.56
1.2
Target Model
Target Model=Qwen / Qw...
2026.03
91.54
1
1
Online-LR
Target Model=Qwen / Qw...
2026.03
91.37
10.63
1.24
OSD-LR
Target Model=Qwen / Qw...
2026.03
89.87
6.21
1.1
Draft Model
Draft Model=Qwen / Qwe...
2026.03
60.66
1
3.54
Rubric-grounded GRPO
Checkpoint=Best by hel...
2026.05
52.88
-
-
Llama-3.1-8B-Instruct
Checkpoint=Base
2026.05
50.06
-
-
Feedback
Search any
task
Search any
task