| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | FineWeb (val) | Validation Loss2.03 | 156 | |
| Language Modeling | FineWeb 100M token (val) | Perplexity12.11 | 9 | |
| LLM Pretraining | FineWeb-Edu (val) | BPB0.861 | 8 | |
| LLM Pretraining | FineWeb-Edu (train) | Training Loss2.964 | 8 | |
| Data Filtering | FineWeb-edu CC-MAIN-2024-10 | Recall@3081.9 | 7 | |
| Speculative Decoding | Fineweb-edu distillation 8B to 300M | Spec. Accept %62 | 7 | |
| Language Modeling | FineWeb-Edu | PPL12.318 | 6 | |
| Speculative Decoding | Fineweb-edu 1.0 (test) | Speculative Accept Rate0.735 | 6 | |
| Model Calibration | Fineweb-edu 1.0 (test) | ECE0.002 | 6 | |
| Language Modeling | Fineweb-edu 1.0 (test) | LM Loss2.32 | 6 | |
| Language Identification | FineWeb2 | Macro F194.52 | 5 | |
| Language Modeling | Fineweb-Edu | Accuracy0.3919 | 3 | |
| Pre-training | FineWeb 124M Transformer (val) | Training Time to Loss 3.28 (min)2.1 | 3 | |
| Language model training | FineWeb (val) | Training Time (s)95.2 | 2 |