Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Adding Mistake on ZebraLogicBench (ZLB)
Loading...
83.8
AOC
Faithfulness Only
76.832
78.641
80.45
82.259
Feb 18, 2026
AOC
Updated 4d ago
Evaluation Results
Method
Method
Links
AOC
Faithfulness Only
Backbone=Qwen3-14B
2026.02
83.8
REMUL
Backbone=Qwen3-14B
2026.02
82.6
MAT-Steer
Backbone=Qwen3-14B
2026.02
80.8
Original
Backbone=Qwen3-14B
2026.02
79.3
Balanced Rewards
Backbone=Qwen3-14B
2026.02
79
Correctness Only
Backbone=Qwen3-14B
2026.02
77.8
Hint Optimized
Backbone=Qwen3-14B
2026.02
77.1
Feedback
Search any
task
Search any
task