Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Faithfulness Evaluation on tldr_news (800 samples)
Loading...
79.5
BLEU
Random
68.268
71.184
74.1
77.016
May 30, 2024
BLEU
ROUGE-L Precision
ROUGE-L Recall
ROUGE-L F1
SentenceBert Similarity
PR Score
KL Divergence
Updated 4d ago
Evaluation Results
Method
Method
Links
BLEU
ROUGE-L Precision
ROUGE-L Recall
ROUGE-L F1
SentenceBert Similarity
PR Score
KL Divergence
Random
Model=Llama-2 (7B-Chat)
2024.05
79.5
73.8
73.9
73.8
0.92
94.3
0.04
Attention
Model=Llama-2 (7B-Chat)
2024.05
76.1
69.5
69.8
69.5
0.9
90.3
0.074
Last-Attention
Model=Llama-2 (7B-Chat)
2024.05
74.9
67.8
68.2
67.9
0.879
86.7
0.11
Integrated-Gradient
Model=Llama-2 (7B-Chat)
2024.05
69.3
60.5
61.1
60.7
0.849
78.3
0.175
JoPA
Model=Llama-2 (7B-Chat)
2024.05
68.7
59.5
59.1
59
0.832
58.9
0.419
Feedback
Search any
task
Search any
task