Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FineWeb

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language ModelingFineWeb (val)
Validation Loss2.03
217
Language ModelingFineWeb-Edu
PPL8.32
141
Language ModelingFineWeb 10B (val)
Validation Perplexity19.54
56
Causal Faithfulness EvaluationFineWeb Temporal
nAOPC0.985
28
Language ModelingFineWeb 100B (val)
Validation Perplexity20.98
28
Language ModelingFineWeb 10B
Validation Perplexity20.69
28
Language ModelingFineWeb 10BT (train)
Loss2.5966
26
Language ModelingFineWeb-EDU
LM Loss2.89
19
Language ModelingFineweb 100B
Perplexity (PPL)2.6
19
Language ModelingFineWeb 10B 8K sequence length
Validation Loss3.3171
16
Language ModelingFineWeb 10B 4K sequence length
Validation Loss3.2291
16
Language ModelingFineWeb 10B 2K sequence length
Validation Loss3.193
16
Language ModelingFineWeb 10B 1K sequence length
Validation Loss3.1705
16
Suffix completion under prefix compressionFineWeb held-out
∆PPL0.259
16
Language ModelingFineWeb-Edu 10K steps (val)
Validation Loss3.372
15
Language ModelingFineWeb-Edu 10K steps (train)
Training Loss3.338
15
Language ModelingFineWeb-Edu 300M×5B (val)
PPL23.6
15
Language ModelingFineWeb-Edu
Throughput (tokens/s)71,600
15
Language Model Pre-trainingFineWeb-Edu (pre-training)
Perplexity14.13
12
Language ModelingFineWeb-Edu
Perplexity (BF16)8.49
12
Information Capacity MeasurementFineWeb-Edu
Rank1
11
Information Capacity MeasurementFineWeb Ch Edu
Rank1
11
Language ModelingFineWeb 100M token (val)
Perplexity12.11
9
Language ModelingFineWeb-Edu (held-out)
Final 50 Train Loss4.733
8
LLM PretrainingFineWeb-Edu (val)
BPB0.861
8
Showing 25 of 39 rows