| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | WikiText-2 | Perplexity (PPL)1.61 | 1,624 | |
| Open-ended text generation | WikiText-2 | COH Score0.811 | 112 | |
| Language Modeling | WikiText-2 raw (test) | Perplexity5.674 | 57 | |
| Feature Space Preservation | WikiText-2 | Cosine Similarity100 | 32 | |
| Language Modeling | WikiText-2 context length 2048 (val) | WikiText-2 PPL9.92 | 24 | |
| Language Modeling | WikiText-2 Llama-3.1-8B-Instruct (test) | Perplexity7.2 | 22 | |
| Language Modeling | WikiText-2 v1 (val) | Perplexity42.41 | 20 | |
| Language Modeling | Wikitext 2 Llama 2 & 3 (test) | PPL (Llama 2, Config 7)5.47 | 16 | |
| Language Modeling | WikiText-2 context length 4096 (test) | PPL (WikiText-2)5.11 | 15 | |
| Language Modeling | Wikitext-2 Standard (val) | Perplexity40.2 | 12 | |
| Causal Prediction | WikiText-2 (val) | Min Validation Loss5.4856 | 11 | |
| Perplexity | WikiText-2 | Perplexity5.5 | 10 | |
| White-box robustness against single point failure attacks | WikiText-2 | Original Perplexity (PPL)5.03 | 8 | |
| Language Modeling | WikiText-2 | Perplexity (Baseline)11.02 | 8 | |
| Language Generation | WikiText-2 (test) | Perplexity3.319 | 8 | |
| Language Modeling | WikiText-2 context length 2048 (test) | Perplexity7.15 | 7 | |
| Language Modeling | Wikitext-2 word-level (dev) | PPL66.9 | 7 | |
| Language Modeling | WikiText-2 TinyLlama | ΔPPL0.0036 | 6 | |
| Language Modeling | WikiText-2 Mistral-7B | ΔPPL0.001 | 6 | |
| Language Modeling | WikiText-2 (val) | Perplexity (BVS)28.45 | 5 | |
| Language Modeling | WikiText-2 Shifted | Shifted PPL98.2 | 5 | |
| Language Modeling | WikiText-2 raw-v1 (val) | Cross Entropy (CE)2.744 | 5 | |
| Language Modeling | WikiText-2 context length 8192 (test) | Perplexity6.5 | 5 | |
| Language Modeling | WikiText-2 Top 5% most uncertain tokens (test) | NLL5.12 | 5 | |
| Language Modeling | WikiText-2 Top 5% most uncertain tokens (val) | NLL5.13 | 5 |