Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Common-sense Reasoning on ARC Challenge (Pass@1, FLOPS)
Loading...
86.73
Pass@1
MFS (Ours)
57.7972
65.3086
72.82
80.3314
Jan 21, 2026
Pass@1
FLOPS
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1
FLOPS
MFS (Ours)
Backbone=LLaMA3.1-8B-I...
2026.01
86.73
-
ϕ-Decoding
Backbone=LLaMA3.1-8B-I...
2026.01
85.41
-
Predictive Decoding
Backbone=LLaMA3.1-8B-I...
2026.01
84.56
-
Guided Decoding
Backbone=LLaMA3.1-8B-I...
2026.01
81.74
-
Tree-of-Thoughts
Backbone=LLaMA3.1-8B-I...
2026.01
80.72
-
MCTS
Backbone=LLaMA3.1-8B-I...
2026.01
79.95
-
MFS (Ours)
Backbone=Mistral-v0.3-...
2026.01
79.95
-
ϕ-Decoding
Backbone=Mistral-v0.3-...
2026.01
78.16
-
MCTS
Backbone=Mistral-v0.3-...
2026.01
74.74
-
Tree-of-Thoughts
Backbone=Mistral-v0.3-...
2026.01
73.63
-
Guided Decoding
Backbone=Mistral-v0.3-...
2026.01
73.55
-
Predictive Decoding
Backbone=Mistral-v0.3-...
2026.01
73.55
-
Auto-Regressive (CoT)
Backbone=Mistral-v0.3-...
2026.01
69.54
-
Auto-Regressive (CoT)
Backbone=LLaMA3.1-8B-I...
2026.01
58.91
-
Feedback
Search any
task
Search any
task