Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning on GSM8K (ERR)
Loading...
10
Error Rate
FLEx
-205
1,246.25
2,697.5
4,148.75
Jan 7, 2026
Error Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Error Rate
FLEx
Backbone Model=Qwen-0.5B
2026.01
10
FLEx
Backbone Model=Gemma-1B
2026.01
70
FLEx
Backbone Model=Gemma-4B
2026.01
417
FLEx
Backbone Model=Qwen-3B
2026.01
563
FLEx
Backbone Model=Gemma-12B
2026.01
625
FLEx
Backbone Model=Qwen-1.5B
2026.01
625
FLEx
Backbone Model=Qwen-7B
2026.01
775
FLEx
Backbone Model=Gemma-27B
2026.01
2,208
FLEx
Backbone Model=Qwen-72B
2026.01
2,963
FLEx
Backbone Model=Qwen-14B
2026.01
3,784
FLEx
Backbone Model=Qwen-32B
2026.01
5,385
Feedback
Search any
task
Search any
task