| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | WikiText-2 | Perplexity (PPL)1.61 | 841 | |
| Open-ended text generation | WikiText-2 | COH Score0.811 | 112 | |
| Language Modeling | WikiText-2 raw (test) | Perplexity5.674 | 57 | |
| Language Modeling | WikiText-2 v1 (val) | Perplexity42.41 | 19 | |
| Language Modeling | Wikitext 2 Llama 2 & 3 (test) | PPL (Llama 2, Config 7)5.47 | 16 | |
| Language Modeling | Wikitext-2 Standard (val) | Perplexity40.2 | 12 | |
| Causal Prediction | WikiText-2 (val) | Min Validation Loss5.4856 | 11 | |
| Language Modeling | WikiText-2 | Perplexity (Baseline)11.02 | 8 | |
| Language Generation | WikiText-2 (test) | Perplexity3.319 | 8 | |
| Language Modeling | WikiText-2 context length 2048 (test) | Perplexity7.15 | 7 | |
| Language Modeling | Wikitext-2 word-level (dev) | PPL66.9 | 7 | |
| Language Modeling | WikiText-2 context length 8192 (test) | Perplexity6.5 | 5 | |
| Language Modeling | WikiText-2 context length 4096 (test) | PPL (WikiText-2)6.36 | 5 | |
| Language Modeling | WikiText-2 Top 5% most uncertain tokens (test) | NLL5.12 | 5 | |
| Language Modeling | WikiText-2 Top 5% most uncertain tokens (val) | NLL5.13 | 5 | |
| Language Modeling | WikiText-2 All tokens (test) | NLL1.66 | 5 | |
| Language Modeling | WikiText-2 All tokens (val) | NLL1.7 | 5 | |
| Language Modeling | WikiText-2 LLaMA-2 7B | PPL5.12 | 3 | |
| Character-level Language Modeling | WikiText-2 (val) | PPL (Validation)3.49 | 3 |