Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Adding Mistake on MuSR
Loading...
0.731
AOC
Faithfulness Only
0.68628
0.69789
0.7095
0.72111
Feb 18, 2026
AOC
Updated 4d ago
Evaluation Results
Method
Method
Links
AOC
Faithfulness Only
Backbone=Qwen3-14B
2026.02
0.731
MAT-Steer
Backbone=Qwen3-14B
2026.02
0.722
REMUL
Backbone=Qwen3-14B
2026.02
0.722
Balanced Rewards
Backbone=Qwen3-14B
2026.02
0.712
Original
Backbone=Qwen3-14B
2026.02
0.708
Correctness Only
Backbone=Qwen3-14B
2026.02
0.694
Hint Optimized
Backbone=Qwen3-14B
2026.02
0.688
Feedback
Search any
task
Search any
task