Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Arithmetic Reasoning on GSM8K (Pass@1, FLOPS)
Loading...
87.64
Pass@1
MFS (Ours)
47.5064
57.9257
68.345
78.7643
Jan 21, 2026
Pass@1
FLOPS
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1
FLOPS
MFS (Ours)
Backbone=LLaMA3.1-8B-I...
2026.01
87.64
-
ϕ-Decoding
Backbone=LLaMA3.1-8B-I...
2026.01
86.58
-
Predictive Decoding
Backbone=LLaMA3.1-8B-I...
2026.01
81.43
-
MCTS
Backbone=LLaMA3.1-8B-I...
2026.01
80.44
-
Tree-of-Thoughts
Backbone=LLaMA3.1-8B-I...
2026.01
75.74
-
Guided Decoding
Backbone=LLaMA3.1-8B-I...
2026.01
75.51
-
Auto-Regressive (CoT)
Backbone=LLaMA3.1-8B-I...
2026.01
70.28
-
MFS (Ours)
Backbone=Mistral-v0.3-...
2026.01
61.64
-
ϕ-Decoding
Backbone=Mistral-v0.3-...
2026.01
60.42
-
MCTS
Backbone=Mistral-v0.3-...
2026.01
60.12
-
Predictive Decoding
Backbone=Mistral-v0.3-...
2026.01
58
-
Tree-of-Thoughts
Backbone=Mistral-v0.3-...
2026.01
53.9
-
Guided Decoding
Backbone=Mistral-v0.3-...
2026.01
53.9
-
Auto-Regressive (CoT)
Backbone=Mistral-v0.3-...
2026.01
49.05
-
Feedback
Search any
task
Search any
task