| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | FineWeb-Edu (test) | Perplexity (Test)20.7 | 49 | |
| Language Modeling | FineWeb-Edu (val) | Final Validation Loss3.003 | 18 | |
| Language Modeling | FineWeb-Edu 500M-token (val) | Valid Loss2.221 | 18 | |
| Language Modeling | FineWeb-Edu 100B (val) | CE Loss2.62 | 13 | |
| Soft Search | FineWeb-Edu English, 1.4T tokens (test) | Similarity Score100 | 12 | |
| Language Modeling | FineWeb-EDU (train) | Loss2.993 | 10 | |
| Language Modeling | FineWeb-Edu 100B (eval) | Perplexity13.75 | 9 | |
| In-context Learning | Fineweb-Edu 16.8B tokens | ARC-c Accuracy36.86 | 8 | |
| Language Modeling | Fineweb-edu distillation 8B to 300M | LM Loss2.74 | 7 | |
| Language Modeling | FineWeb-edu 12 text passages (held-out) | Average Loss1.683 | 3 | |
| Language Modeling | FineWeb-Edu 1.4B tokens (val) | Loss3.271 | 3 |