Share your thoughts, 1 month free Claude Pro on usSee more

Open-form mathematical reasoning on GSM8K

52.45Accuracy

S2L

Updated 2mo ago

Evaluation Results

Method	Links
S2L 2025.10		52.45
TRIM 2025.10		52.23
LESS 2025.10		52.1
Full-data Fine-tuning 2025.10		51.15
TAGCOS 2025.10		50.2
Random 2025.10		48.16
Pretrained (no Fine-tuning) 2025.10		3.1