Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Language Modeling on Fineweb-edu distillation 8B to 300M

2.74LM Loss

Random Sampling KD

2.73562.76532.7952.8247Mar 21, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2.741000.9
2025.03
2.751000.7
2025.03
2.77571.9
2025.03
2.78-471.7
2025.03
2.852.2
2025.03
2.8101.2
2025.03
2.85-781.4