Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (CoT)
Loading...
99.49
Accuracy
Average
68.5604
76.5902
84.62
92.6498
Mar 3, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Average
Backbone=LLaMA-3
2026.03
99.49
Task Arithmetic
Backbone=LLaMA-3
2026.03
98.77
ACE
Backbone=LLaMA-3
2026.03
94.26
math
Model Type=Single-task...
2026.03
74
Average
Model Type=Merged mode...
2026.03
73.62
Task Arithmetic
Model Type=Merged mode...
2026.03
73.09
ACE
Model Type=Merged mode...
2026.03
69.75
Feedback
Search any
task
Search any
task