Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K MC
Loading...
95.68
Accuracy
SC
83.8552
86.9251
89.995
93.0649
Mar 15, 2026
Accuracy
Updated 10d ago
Evaluation Results
Method
Method
Links
Accuracy
SC
Model=Olmo-2-13B
2026.03
95.68
IoT
Model=GPT-4o mini
2026.03
95
CoT
Model=GPT-4o mini
2026.03
94.74
IoT
Model=Olmo-2-13B
2026.03
93.1
SC
Model=Llama-3.3-8B
2026.03
91.89
IoT
Model=Olmo-2-7B
2026.03
91.66
CoT
Model=Olmo-2-13B
2026.03
91.36
IoT
Model=Llama-3.3-8B
2026.03
90.67
EoT
Model=Olmo-2-13B
2026.03
90.14
SC
Model=Olmo-2-7B
2026.03
89.39
CoT
Model=Olmo-2-7B
2026.03
89.08
CoT
Model=Llama-3.3-8B
2026.03
87.41
EoT
Model=Olmo-2-7B
2026.03
87.06
EoT
Model=Llama-3.3-8B
2026.03
84.31
Feedback
Search any
task
Search any
task