Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (pass@K)
Loading...
97.77
pass@K
RECAP
90.022
92.0335
94.045
96.0565
Oct 1, 2025
pass@K
Updated 6d ago
Evaluation Results
Method
Method
Links
pass@K
RECAP
Backbone=DSQwen-14B
2025.10
97.77
DAPO
Backbone=DSQwen-14B
2025.10
97.19
SafeChain
Backbone=DSQwen-14B
2025.10
96.44
SFT
Backbone=DSQwen-14B
2025.10
95.9
STAR
Backbone=DSQwen-14B
2025.10
95.86
Original
Backbone=DSQwen-14B
2025.10
95.2
RECAP
Backbone=DSLlama-8B
2025.10
93.72
DAPO
Backbone=DSLlama-8B
2025.10
93.71
SafeChain
Backbone=DSLlama-8B
2025.10
91.32
SFT
Backbone=DSLlama-8B
2025.10
91.32
STAR
Backbone=DSLlama-8B
2025.10
90.74
Original
Backbone=DSLlama-8B
2025.10
90.32
Feedback
Search any
task
Search any
task