Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Pre-training

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language ModelingPre-training (val)
Validation Loss1.602
55
Pre-trainingPre-training (evaluation)
Pre-training Eval Loss3.254
5
Pre-training efficiencyPre-training
Muon Steps4,228
4
Showing 3 of 3 rows