Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Arithmetic Reasoning on GSM8K (4-shot CoT)
Loading...
90.25
Accuracy
ADAFUSE (Top-2 Base)
60.5684
68.2742
75.98
83.6858
Jan 9, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
ADAFUSE (Top-2 Base)
Base model selection s...
2026.01
90.25
Qwen3-8B
Model Type=Base Model
2026.01
89.99
LLaMA-3.1-8B-Instruct
Model Type=Base Model
2026.01
81.05
ADAFUSE (Fixed Base)
Base model selection s...
2026.01
79.15
InternLM3-8B-Instruct
Model Type=Base Model
2026.01
76.72
DEEPEN
Base model selection s...
2026.01
67.63
UniTE
Base model selection s...
2026.01
64.59
SWEETSPAN
Base model selection s...
2026.01
63.38
LLM-BLENDER
Base model selection s...
2026.01
62.55
Mistral-7B-Instruct-v0.3
Model Type=Base Model
2026.01
61.71
Feedback
Search any
task
Search any
task