| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | Pre-training corpus (train) | Perplexity15.71 | 20 | |
| Language Modeling | Pre-training corpus | Loss1.577 | 9 | |
| Next token prediction | Pre-training corpus (train) | Token Accuracy66.4 | 9 | |
| Language Modeling | 1.3B 26B-token pre-training corpus (val) | Validation Cross-Entropy2.077 | 3 |