Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Truncated CoT Answering on MuSR
Loading...
33.6
AOC
MAT-Steer
30.272
31.136
32
32.864
Feb 18, 2026
AOC
Updated 4d ago
Evaluation Results
Method
Method
Links
AOC
MAT-Steer
Backbone=Qwen3-14B
2026.02
33.6
Original
Backbone=Qwen3-14B
2026.02
33.2
Faithfulness Only
Backbone=Qwen3-14B
2026.02
33
Balanced Rewards
Backbone=Qwen3-14B
2026.02
32.8
REMUL
Backbone=Qwen3-14B
2026.02
32.5
Correctness Only
Backbone=Qwen3-14B
2026.02
31.7
Hint Optimized
Backbone=Qwen3-14B
2026.02
30.4
Feedback
Search any
task
Search any
task