Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Truncated CoT Answering on BBEH
Loading...
0.665
AOC
Faithfulness Only
0.49652
0.54026
0.584
0.62774
Feb 18, 2026
AOC
Updated 4d ago
Evaluation Results
Method
Method
Links
AOC
Faithfulness Only
Backbone=Qwen3-14B
2026.02
0.665
MAT-Steer
Backbone=Qwen3-14B
2026.02
0.644
REMUL
Backbone=Qwen3-14B
2026.02
0.626
Original
Backbone=Qwen3-14B
2026.02
0.58
Balanced Rewards
Backbone=Qwen3-14B
2026.02
0.574
Correctness Only
Backbone=Qwen3-14B
2026.02
0.518
Hint Optimized
Backbone=Qwen3-14B
2026.02
0.503
Feedback
Search any
task
Search any
task