Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning (Fine-tuning) on GSM8K (test)
Loading...
4.598
Test Perplexity
BAOC0.5
4.25996
6.54173
8.8235
11.10527
May 6, 2026
Test Perplexity
Updated 27d ago
Evaluation Results
Method
Method
Links
Test Perplexity
BAOC0.5
Architecture=GPT2-smal...
2026.05
4.598
AdamW16
Architecture=GPT2-smal...
2026.05
4.615
COSMOS
Architecture=GPT2-smal...
2026.05
4.737
Adam-mini
Architecture=GPT2-smal...
2026.05
4.995
Muon
Architecture=GPT2-smal...
2026.05
6.238
Adam-mini
Architecture=T5-base,...
2026.05
8.898
BAOC0.5
Architecture=T5-base,...
2026.05
9.137
AdamW16
Architecture=T5-base,...
2026.05
9.312
COSMOS
Architecture=T5-base,...
2026.05
9.921
Muon
Architecture=T5-base,...
2026.05
13.049
Feedback
Search any
task
Search any
task