Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Faithfulness Evaluation on Alpaca 800 samples
Loading...
62.1
BLEU
Random
47.332
51.166
55
58.834
May 30, 2024
BLEU
ROUGE-L Precision
ROUGE-L Recall
ROUGE-L F1
SentenceBert Similarity
PR Score
KL Divergence
Updated 4d ago
Evaluation Results
Method
Method
Links
BLEU
ROUGE-L Precision
ROUGE-L Recall
ROUGE-L F1
SentenceBert Similarity
PR Score
KL Divergence
Random
Model=Llama-2 (7B-Chat)
2024.05
62.1
53.9
54
53.5
82.4
87.1
0.104
Attention
Model=Llama-2 (7B-Chat)
2024.05
54.6
46.2
46.4
45.8
71.3
80.2
0.168
Last-Attention
Model=Llama-2 (7B-Chat)
2024.05
54
44.7
45.3
44.4
71.3
78.7
0.191
Integrated-Gradient
Model=Llama-2 (7B-Chat)
2024.05
49.7
39.5
39.6
39
68.7
70.4
0.275
JoPA
Model=Llama-2 (7B-Chat)
2024.05
47.9
38.3
37.6
37.2
64.2
56.5
0.479
Feedback
Search any
task
Search any
task