Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop QA on 2WikiMHQA
Loading...
0.56
Average Violation
SAVER
0.4692
1.0821
1.695
2.3079
Apr 9, 2026
Average Violation
Violation Frequency Rate (VFR)
Post-Response Score
User Satisfaction Rate (USR)
Updated 8d ago
Evaluation Results
Method
Method
Links
Average Violation
Violation Frequency Rate (VFR)
Post-Response Score
User Satisfaction Rate (USR)
SAVER
Model Scale=LLaMA-3.1-8B
2026.04
0.56
72.34
8
13.84
MAD
Model Scale=LLaMA-3.1-8B
2026.04
1.81
32.78
-
28.82
CoT
Model Scale=LLaMA-3.1-8B
2026.04
2.21
17.41
-
32.11
VANILLA LM
Model Scale=LLaMA-3.1-8B
2026.04
2.83
6.58
-
53.19
Feedback
Search any
task
Search any
task