Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop QA on MuSiQue (Violation Metrics)
Loading...
0.83
Avg Violation
SAVER
0.7332
1.3866
2.04
2.6934
Apr 9, 2026
Avg Violation
Violation Frequency Rate (VFR)
Post-Response Score
User Satisfaction Rate (USR)
Updated 8d ago
Evaluation Results
Method
Method
Links
Avg Violation
Violation Frequency Rate (VFR)
Post-Response Score
User Satisfaction Rate (USR)
SAVER
Model Scale=LLaMA-3.1-8B
2026.04
0.83
69.38
0.11
19.73
MAD
Model Scale=LLaMA-3.1-8B
2026.04
2.16
26.17
-
36.51
CoT
Model Scale=LLaMA-3.1-8B
2026.04
2.91
13.26
-
37.58
VANILLA LM
Model Scale=LLaMA-3.1-8B
2026.04
3.25
5.34
-
62.63
Feedback
Search any
task
Search any
task