Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Open-form mathematical reasoning on GSM8K
Loading...
52.45
Accuracy
S2L
1.126
14.4505
27.775
41.0995
Oct 8, 2025
Accuracy
Updated 19d ago
Evaluation Results
Method
Method
Links
Accuracy
S2L
Backbone=LLAMA-2-7B, C...
2025.10
52.45
TRIM
Backbone=LLAMA-2-7B, C...
2025.10
52.23
LESS
Backbone=LLAMA-2-7B, C...
2025.10
52.1
Full-data Fine-tuning
Backbone=LLAMA-2-7B, C...
2025.10
51.15
TAGCOS
Backbone=LLAMA-2-7B, C...
2025.10
50.2
Random
Backbone=LLAMA-2-7B, C...
2025.10
48.16
Pretrained (no Fine-tuning)
Backbone=LLAMA-2-7B, C...
2025.10
3.1
Feedback
Search any
task
Search any
task