Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Knowledge Distillation on The Pile
Loading...
1,200
Raw KL Divergence
Student
1,139.88
1,545.69
1,951.5
2,357.31
Apr 5, 2026
Raw KL Divergence
Per-Token KL Divergence
Normalized KL Score
Updated 12d ago
Evaluation Results
Method
Method
Links
Raw KL Divergence
Per-Token KL Divergence
Normalized KL Score
Student
dS=1024, Params=405M
2026.04
1,200
0.586
44.4
Student
dS=768, Params=247M
2026.04
1,335
0.652
49.4
Student
dS=512, Params=127M
2026.04
1,501
0.733
55.5
Student
dS=256, Params=45M
2026.04
2,064
1.008
76.4
Student
dS=128, Params=18M
2026.04
2,703
1.32
100
Feedback
Search any
task
Search any
task