| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | FineWeb (val) | Validation Loss2.03 | 217 | |
| Language Modeling | FineWeb-Edu | PPL8.32 | 141 | |
| Language Modeling | FineWeb 10B (val) | Validation Perplexity19.54 | 56 | |
| Causal Faithfulness Evaluation | FineWeb Temporal | nAOPC0.985 | 28 | |
| Language Modeling | FineWeb 100B (val) | Validation Perplexity20.98 | 28 | |
| Language Modeling | FineWeb 10B | Validation Perplexity20.69 | 28 | |
| Language Modeling | FineWeb 10BT (train) | Loss2.5966 | 26 | |
| Language Modeling | FineWeb-EDU | LM Loss2.89 | 19 | |
| Language Modeling | Fineweb 100B | Perplexity (PPL)2.6 | 19 | |
| Language Modeling | FineWeb 10B 8K sequence length | Validation Loss3.3171 | 16 | |
| Language Modeling | FineWeb 10B 4K sequence length | Validation Loss3.2291 | 16 | |
| Language Modeling | FineWeb 10B 2K sequence length | Validation Loss3.193 | 16 | |
| Language Modeling | FineWeb 10B 1K sequence length | Validation Loss3.1705 | 16 | |
| Suffix completion under prefix compression | FineWeb held-out | ∆PPL0.259 | 16 | |
| Language Modeling | FineWeb-Edu 10K steps (val) | Validation Loss3.372 | 15 | |
| Language Modeling | FineWeb-Edu 10K steps (train) | Training Loss3.338 | 15 | |
| Language Modeling | FineWeb-Edu 300M×5B (val) | PPL23.6 | 15 | |
| Language Modeling | FineWeb-Edu | Throughput (tokens/s)71,600 | 15 | |
| Language Model Pre-training | FineWeb-Edu (pre-training) | Perplexity14.13 | 12 | |
| Language Modeling | FineWeb-Edu | Perplexity (BF16)8.49 | 12 | |
| Information Capacity Measurement | FineWeb-Edu | Rank1 | 11 | |
| Information Capacity Measurement | FineWeb Ch Edu | Rank1 | 11 | |
| Language Modeling | FineWeb 100M token (val) | Perplexity12.11 | 9 | |
| Language Modeling | FineWeb-Edu (held-out) | Final 50 Train Loss4.733 | 8 | |
| LLM Pretraining | FineWeb-Edu (val) | BPB0.861 | 8 |