Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (test) (Accuracy, Average response length)
Loading...
95.7
Accuracy
RPAM
92.892
93.621
94.35
95.079
Jan 7, 2026
Accuracy
Avg Response Length
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy
Avg Response Length
RPAM
Method Category=Data-d...
2026.01
95.7
1,086
CoD
Method Category=Prompt...
2026.01
95.6
1,017
Average Merging
Method Category=Data-f...
2026.01
95.4
1,088
Ada-R1
Method Category=Traini...
2026.01
95.3
1,161
Qwen3-4B-Thinking
Method Category=Base M...
2026.01
95.2
1,521
Task Arithmetic
Method Category=Data-f...
2026.01
95.2
1,181
DARE-Linear
Method Category=Data-f...
2026.01
95.1
1,641
AIM
Method Category=Data-d...
2026.01
95.1
1,060
TIES Merging
Method Category=Data-f...
2026.01
94.8
1,170
ACM
Method Category=Data-d...
2026.01
94.7
925
Qwen3-4B-Instruct
Method Category=Base M...
2026.01
93
374
Feedback
Search any
task
Search any
task