Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Adding Mistake on BBEH
Loading...
67.2
AOC
Faithfulness Only
58.36
60.655
62.95
65.245
Feb 18, 2026
AOC
Updated 4d ago
Evaluation Results
Method
Method
Links
AOC
Faithfulness Only
Backbone=Qwen3-14B
2026.02
67.2
REMUL
Backbone=Qwen3-14B
2026.02
66.1
MAT-Steer
Backbone=Qwen3-14B
2026.02
64.9
Original
Backbone=Qwen3-14B
2026.02
62.1
Balanced Rewards
Backbone=Qwen3-14B
2026.02
61.4
Correctness Only
Backbone=Qwen3-14B
2026.02
59.8
Hint Optimized
Backbone=Qwen3-14B
2026.02
58.7
Feedback
Search any
task
Search any
task