Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Fineweb-edu

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language ModelingFineWeb-Edu (test)
Perplexity (Test)16.46
58
Language ModelingFineWeb-Edu (val)
Perplexity8.71
51
Language ModelingFineWeb-Edu 500M-token (val)
Valid Loss2.221
18
Unconditional GenerationFineWeb-Edu Mask source 170M-parameter (train)
Entropy8.5
17
Unconditional GenerationFineWeb-Edu Uniform source 170M-parameter (train)
Entropy7
17
Language ModelingFineWeb-EDU (train)
Loss2.993
16
Language ModelingFineWeb-Edu 100B (val)
CE Loss2.62
13
Soft SearchFineWeb-Edu English, 1.4T tokens (test)
Similarity Score100
12
Language ModelingFineWeb-Edu 100B (eval)
Perplexity13.75
9
In-context LearningFineweb-Edu 16.8B tokens
ARC-c Accuracy36.86
8
Language ModelingFineweb-edu distillation 8B to 300M
LM Loss2.74
7
Language ModelingFineWeb-Edu 10B (held-out last unused bin)
Perplexity15.34
5
Language modelingFineWeb-Edu 10B (val)
Validation CE3.01
5
Language ModelingFineWeb-Edu 20B tokens (val)
Final PPL15.03
3
Language ModelingFineWeb-edu 12 text passages (held-out)
Average Loss1.683
3
Language ModelingFineWeb-Edu 1.4B tokens (val)
Loss3.271
3
Showing 16 of 16 rows