| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | FineWeb-Edu (test) | Perplexity (Test)16.46 | 58 | |
| Language Modeling | FineWeb-Edu (val) | Perplexity8.71 | 51 | |
| Language Modeling | FineWeb-Edu 500M-token (val) | Valid Loss2.221 | 18 | |
| Unconditional Generation | FineWeb-Edu Mask source 170M-parameter (train) | Entropy8.5 | 17 | |
| Unconditional Generation | FineWeb-Edu Uniform source 170M-parameter (train) | Entropy7 | 17 | |
| Language Modeling | FineWeb-EDU (train) | Loss2.993 | 16 | |
| Language Modeling | FineWeb-Edu 100B (val) | CE Loss2.62 | 13 | |
| Soft Search | FineWeb-Edu English, 1.4T tokens (test) | Similarity Score100 | 12 | |
| Language Modeling | FineWeb-Edu 100B (eval) | Perplexity13.75 | 9 | |
| In-context Learning | Fineweb-Edu 16.8B tokens | ARC-c Accuracy36.86 | 8 | |
| Language Modeling | Fineweb-edu distillation 8B to 300M | LM Loss2.74 | 7 | |
| Language Modeling | FineWeb-Edu 10B (held-out last unused bin) | Perplexity15.34 | 5 | |
| Language modeling | FineWeb-Edu 10B (val) | Validation CE3.01 | 5 | |
| Language Modeling | FineWeb-Edu 20B tokens (val) | Final PPL15.03 | 3 | |
| Language Modeling | FineWeb-edu 12 text passages (held-out) | Average Loss1.683 | 3 | |
| Language Modeling | FineWeb-Edu 1.4B tokens (val) | Loss3.271 | 3 |