Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Graduate-level Factual Reasoning on GPQA (Pass@1, FLOPS)
Loading...
34.9
Pass@1
MFS (Ours)
21.588
25.044
28.5
31.956
Jan 21, 2026
Pass@1
FLOPS
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1
FLOPS
MFS (Ours)
Backbone=LLaMA3.1-8B-I...
2026.01
34.9
-
ϕ-Decoding
Backbone=LLaMA3.1-8B-I...
2026.01
34.6
-
Tree-of-Thoughts
Backbone=LLaMA3.1-8B-I...
2026.01
31.25
-
Predictive Decoding
Backbone=LLaMA3.1-8B-I...
2026.01
31.03
-
Guided Decoding
Backbone=LLaMA3.1-8B-I...
2026.01
30.58
-
MFS (Ours)
Backbone=Mistral-v0.3-...
2026.01
30.58
-
ϕ-Decoding
Backbone=Mistral-v0.3-...
2026.01
29.24
-
Guided Decoding
Backbone=Mistral-v0.3-...
2026.01
27.46
-
Auto-Regressive (CoT)
Backbone=LLaMA3.1-8B-I...
2026.01
26.56
-
Tree-of-Thoughts
Backbone=Mistral-v0.3-...
2026.01
26.34
-
MCTS
Backbone=LLaMA3.1-8B-I...
2026.01
24.11
-
Auto-Regressive (CoT)
Backbone=Mistral-v0.3-...
2026.01
23.88
-
MCTS
Backbone=Mistral-v0.3-...
2026.01
22.77
-
Predictive Decoding
Backbone=Mistral-v0.3-...
2026.01
22.1
-
Feedback
Search any
task
Search any
task