Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FineWeb

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language ModelingFineWeb (val)
Validation Loss2.03
159
Language ModelingFineWeb-Edu
PPL10.24
26
Language ModelingFineWeb-Edu
Throughput (tokens/s)71,600
15
Information Capacity MeasurementFineWeb-Edu
Rank1
11
Information Capacity MeasurementFineWeb Ch Edu
Rank1
11
Language ModelingFineweb 100B
Perplexity (PPL)12.29
9
Language ModelingFineWeb 100M token (val)
Perplexity12.11
9
LLM PretrainingFineWeb-Edu (val)
BPB0.861
8
LLM PretrainingFineWeb-Edu (train)
Training Loss2.964
8
Data FilteringFineWeb-edu CC-MAIN-2024-10
Recall@3081.9
7
Speculative DecodingFineweb-edu distillation 8B to 300M
Spec. Accept %62
7
Speculative DecodingFineweb-edu 1.0 (test)
Speculative Accept Rate0.735
6
Model CalibrationFineweb-edu 1.0 (test)
ECE0.002
6
Language ModelingFineweb-edu 1.0 (test)
LM Loss2.32
6
Language IdentificationFineWeb2
Macro F194.52
5
Language ModelingFineWeb Nanochat 5M token (val)
BPB (val)2.681
4
Language ModelingFineweb-Edu
Accuracy0.3919
3
Pre-trainingFineWeb 124M Transformer (val)
Training Time to Loss 3.28 (min)2.1
3
Language model trainingFineWeb (val)
Training Time (s)95.2
2
Showing 19 of 19 rows