Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning on CounterBench
Loading...
0.0359
Error Rate
FLEx
0.016632
0.146691
0.27675
0.406809
Jan 7, 2026
Error Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Error Rate
FLEx
Backbone Model=Gemma-1B
2026.01
0.0359
FLEx
Backbone Model=Qwen-1.5B
2026.01
0.14
FLEx
Backbone Model=Qwen-3B
2026.01
0.1535
FLEx
Backbone Model=Gemma-27B
2026.01
0.1958
FLEx
Backbone Model=Qwen-32B
2026.01
0.2414
FLEx
Backbone Model=Qwen-14B
2026.01
0.2538
FLEx
Backbone Model=Gemma-4B
2026.01
0.2678
FLEx
Backbone Model=Qwen-7B
2026.01
0.2691
FLEx
Backbone Model=Qwen-0.5B
2026.01
0.2706
FLEx
Backbone Model=Qwen-72B
2026.01
0.3547
FLEx
Backbone Model=Gemma-12B
2026.01
0.5176
Feedback
Search any
task
Search any
task