Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (Accuracy, Mean Response Tokens, Length Reduction)
Loading...
76.96
Accuracy
GRPO
73.5072
74.4036
75.3
76.1964
May 27, 2026
Accuracy
Mean Response Tokens
Length Reduction
Updated 7d ago
Evaluation Results
Method
Method
Links
Accuracy
Mean Response Tokens
Length Reduction
GRPO
Model=Qwen2.5-1.5B-Ins...
2026.05
76.96
265.08
-
GRPO+FIRSTN
Model=Qwen2.5-1.5B-Ins...
2026.05
76.85
237.77
10.3
Pair
Model=Qwen2.5-1.5B-Ins...
2026.05
74.44
144.77
45.4
BPPO
Model=Qwen2.5-1.5B-Ins...
2026.05
73.84
122.39
53.8
CPPO
Model=Qwen2.5-1.5B-Ins...
2026.05
73.64
232.62
12.2
Feedback
Search any
task
Search any
task