| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | WikiText-2 | Perplexity (PPL)1.08 | 2,320 | |
| Open-ended text generation | WikiText-2 | COH Score0.811 | 112 | |
| Perplexity | WikiText-2 | Perplexity2.81 | 97 | |
| Language Modeling | WikiText-2 | Average Kurtosis93.039 | 72 | |
| Language Modeling | WikiText-2 (val) | Perplexity (BVS)7.7 | 70 | |
| Language Modeling | WikiText-2 raw (test) | Perplexity5.674 | 63 | |
| Text Generation | WikiText-2 | Perplexity10.98 | 50 | |
| Language Modeling | WikiText-2 | Perplexity (PPL)5.82 | 40 | |
| Language Modeling | WikiText-2 | Mauve0.9 | 33 | |
| Language Modeling | WikiText-2 | WikiText-2 Score4.93 | 32 | |
| Feature Space Preservation | WikiText-2 | Cosine Similarity100 | 32 | |
| Language Modeling | WikiText-2 context length 2048 (val) | WikiText-2 PPL9.92 | 24 | |
| Language Modeling | WikiText-2 | Perplexity9.6 | 22 | |
| Language Modeling | WikiText-2 Llama-3.1-8B-Instruct (test) | Perplexity7.2 | 22 | |
| Language Modeling | WikiText-2 v1 (val) | Perplexity42.41 | 20 | |
| Language Modeling | WikiText-2 | Perplexity (PPL)4.91 | 19 | |
| Language Modeling | WikiText-2 10K-word evaluator standardized | PPL Delta (%)11.9 | 18 | |
| Language Modeling | Wikitext 2 Llama 2 & 3 (test) | PPL (Llama 2, Config 7)5.47 | 16 | |
| Language Generation | WikiText-2 (test) | Perplexity3.319 | 16 | |
| Text Generation | WikiText-2 | ROUGE-139.1 | 15 | |
| Language Modeling | WikiText-2 context length 4096 (test) | PPL (WikiText-2)5.11 | 15 | |
| Language Modeling | WikiText-2 2017 (test) | PPL (Uniform)7.5 | 12 | |
| Language Modeling | WikiText-2 | Perplexity (Baseline)5.44 | 12 | |
| Language Modeling | Wikitext-2 Standard (val) | Perplexity40.2 | 12 | |
| Causal Prediction | WikiText-2 (val) | Min Validation Loss5.4856 | 11 |