Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (Acc, SR, JB, HB)
Loading...
81.65
Accuracy
SFT
54.922
61.861
68.8
75.739
Mar 8, 2026
Accuracy
Success Rate
Jump Back
Hardness Bias
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Success Rate
Jump Back
Hardness Bias
SFT
Backbone=Qwen2.5-7B-In...
2026.03
81.65
71.25
52.5
94.5
PACT
Backbone=Qwen2.5-7B-In...
2026.03
80.89
9.27
11
29.5
AsFT
Backbone=Qwen2.5-7B-In...
2026.03
80.5
9.27
28
48
Safe LoRA
Backbone=Qwen2.5-7B-In...
2026.03
73.69
15.65
20
44.5
Initial
Backbone=Qwen2.5-7B-In...
2026.03
56.56
1.6
3
7
Constrained SFT
Backbone=Qwen2.5-7B-In...
2026.03
55.95
10.92
30.5
80
Feedback
Search any
task
Search any
task