Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (Strict-match Accuracy)
Loading...
78.9
Strict-match Accuracy
S-SPPO
41.356
51.103
60.85
70.597
Jun 1, 2026
Strict-match Accuracy
Updated 1d ago
Evaluation Results
Method
Method
Links
Strict-match Accuracy
S-SPPO
Backbone=Llama-3-8B, I...
2026.06
78.9
SPPO
Backbone=Llama-3-8B, I...
2026.06
78.2
Llama-3-8B Base
Backbone=Llama-3-8B, O...
2026.06
76.4
S-SPPO
Backbone=Mistral-7B, I...
2026.06
44.2
SPPO
Backbone=Mistral-7B, I...
2026.06
44
Mistral-7B Base
Backbone=Mistral-7B, O...
2026.06
42.8
Feedback
Search any
task
Search any
task