Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Pre-training corpus

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language ModelingPre-training corpus (train)
Perplexity15.71
20
Language ModelingPre-training corpus
Loss1.577
9
Next token predictionPre-training corpus (train)
Token Accuracy66.4
9
Language Modeling1.3B 26B-token pre-training corpus (val)
Validation Cross-Entropy2.077
3
Showing 4 of 4 rows