Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Language Generation on Dolly databricks 15k (test)
Loading...
29.7
ROUGE-L
Teacher
22.42
24.31
26.2
28.09
Mar 4, 2026
ROUGE-L
Updated 1mo ago
Evaluation Results
Method
Method
Links
ROUGE-L
Teacher
Model=LLaMA, #Params=13B
2026.03
29.7
VQAE
Model=LLaMA, #Params=7B
2026.03
28.2
Teacher
Model=GPT-2, #Params=1.5B
2026.03
27.6
SeqKD
Model=LLaMA, #Params=7B
2026.03
27.5
KD
Model=LLaMA, #Params=7B
2026.03
27.4
SFT w/o KD
Model=LLaMA, #Params=7B
2026.03
26.3
KD
Model=GPT-2, #Params=760M
2026.03
25.9
VQAE
Model=GPT-2, #Params=760M
2026.03
25.7
SeqKD
Model=GPT-2, #Params=760M
2026.03
25.6
SFT w/o KD
Model=GPT-2, #Params=760M
2026.03
25.4
VQAE
Model=GPT-2, #Params=120M
2026.03
23.5
SFT w/o KD
Model=GPT-2, #Params=120M
2026.03
23.3
KD
Model=GPT-2, #Params=120M
2026.03
22.8
SeqKD
Model=GPT-2, #Params=120M
2026.03
22.7
Feedback
Search any
task
Search any
task