Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (test) (Accuracy, Token count)
Loading...
96.3
Accuracy
S-GRPO
95.78
95.915
96.05
96.185
Apr 8, 2026
Accuracy
Token count
Updated 9d ago
Evaluation Results
Method
Method
Links
Accuracy
Token count
S-GRPO
Backbone=Qwen3-14B
2026.04
96.3
952
Vanilla
Backbone=Qwen3-14B
2026.04
96.2
1,672
DTSR
Backbone=Qwen3-14B
2026.04
96.2
849
RL + Length Penalty
Backbone=Qwen3-14B
2026.04
95.8
1,090
Feedback
Search any
task
Search any
task