Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Adding Mistake on FOLIO
Loading...
0.714
AOC
Faithfulness Only
0.63808
0.65779
0.6775
0.69721
Feb 18, 2026
AOC
Updated 4d ago
Evaluation Results
Method
Method
Links
AOC
Faithfulness Only
Backbone=Qwen3-14B
2026.02
0.714
REMUL
Backbone=Qwen3-14B
2026.02
0.703
MAT-Steer
Backbone=Qwen3-14B
2026.02
0.674
Balanced Rewards
Backbone=Qwen3-14B
2026.02
0.671
Original
Backbone=Qwen3-14B
2026.02
0.667
Correctness Only
Backbone=Qwen3-14B
2026.02
0.648
Hint Optimized
Backbone=Qwen3-14B
2026.02
0.641
Feedback
Search any
task
Search any
task