Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (pass@1, Avg Rank)
Loading...
88.7
Pass@1
GRPO
19.9456
37.7953
55.645
73.4947
Nov 22, 2025
Dec 15, 2025
Jan 8, 2026
Feb 1, 2026
Feb 25, 2026
Mar 21, 2026
Apr 14, 2026
Pass@1
Average Rank
Updated 12d ago
Evaluation Results
Method
Method
Links
Pass@1
Average Rank
GRPO
2026.04
88.7
2.43
Reinforce++
2026.04
86.58
4.57
SEARL
2026.04
86.2
1.43
distillm2
2026.04
85.7
-
VCRD-Prob
approach=PRM-free
2026.04
84.8
-
ARPO
2026.04
82.41
3.57
DAPO
2026.04
80.59
3
Qwen3-8B-Base
Seed=Avg
2025.11
70.93
-
P-POTS+Mirror
Backbone=LLaDA-8B-Inst...
2025.11
60.53
-
Clipped
Backbone=LLaDA-8B-Inst...
2025.11
56.18
-
MIRROR
Backbone=LLaDA-8B-Inst...
2025.11
53.7
-
TIR Prompt
2026.04
22.59
5.29
Feedback
Search any
task
Search any
task