Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning on ReasonIF
Loading...
6.7
Error Rate (ERR)
FLEx
3.6448
24.2674
44.89
65.5126
Jan 7, 2026
Error Rate (ERR)
Updated 4d ago
Evaluation Results
Method
Method
Links
Error Rate (ERR)
FLEx
Backbone Model=Gemma-27B
2026.01
6.7
FLEx
Backbone Model=Qwen-1.5B
2026.01
6.7
FLEx
Backbone Model=Qwen-3B
2026.01
9.42
FLEx
Backbone Model=Qwen-0.5B
2026.01
9.77
FLEx
Backbone Model=Gemma-1B
2026.01
22.08
FLEx
Backbone Model=Gemma-12B
2026.01
27.1
FLEx
Backbone Model=Gemma-4B
2026.01
32
FLEx
Backbone Model=Qwen-7B
2026.01
39.75
FLEx
Backbone Model=Qwen-72B
2026.01
51.28
FLEx
Backbone Model=Qwen-14B
2026.01
82.56
FLEx
Backbone Model=Qwen-32B
2026.01
83.08
Feedback
Search any
task
Search any
task