Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Arithmetic Reasoning on AQuA, GSM8K, MAWPS, SVAMP
Loading...
62.2
AQuA Accuracy
Qwen2.5-14B
30.688
38.869
47.05
55.231
Dec 26, 2025
AQuA Accuracy
GSM8K Accuracy
MAWPS Accuracy
SVAMP Accuracy
Average Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
AQuA Accuracy
GSM8K Accuracy
MAWPS Accuracy
SVAMP Accuracy
Average Accuracy
Qwen2.5-14B
Params=14B, Latency (s...
2025.12
62.2
78.5
92.4
87.4
80.1
LLMBOOST (3B × 2)
Params=6B (-1B), Laten...
2025.12
60.2
74
92.4
85.3
78
LLMBOOST (7B × 2)
Params=14B, Latency (s...
2025.12
59.8
78.1
94.1
88.8
80.2
Qwen2.5-7B
Params=7B, Latency (s/...
2025.12
56.8
75.1
92
86.1
77.5
LLMBOOST (8B + 3B)
Params=11B, Latency (s...
2025.12
43.7
64.4
89.7
78.6
69.1
Llama 3.1-8B
Params=8B, Latency (s/...
2025.12
42.3
63.7
89.5
77.4
68.2
LLama-3.2-3B
Params=3B, Latency (s/...
2025.12
31.9
49.7
84.9
65.8
58.1
Feedback
Search any
task
Search any
task