| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | WikiText-2 (test) | PPL1.2 | 2,333 | |
| Language Modeling | WikiText | PPL0.2838 | 740 | |
| Language Modeling | WikiText | Word Perplexity3.12 | 234 | |
| Language Modeling | WikiText-2 | Perplexity4.88 | 105 | |
| Language Modeling | Wikitext | Wikitext PPL12.85 | 87 | |
| Language Modeling | WikiText (test) | Perplexity4.88 | 66 | |
| Language Modeling | WikiText (val) | Perplexity12.51 | 62 | |
| Language Modeling | WikiText v1 (test) | Perplexity13.33 | 30 | |
| Language Modeling | WikiText (held-out) | Perplexity (PPL)9.8 | 25 | |
| Language Modeling | WikiText-103 | Throughput (tokens/s)159,000 | 21 | |
| Language Modeling | WikiText (WT) | Relative PPL Change (%)31 | 16 | |
| Language Modeling | WikiText | PPL Change (%)1.7 | 16 | |
| Language Modeling | WikiText-103 | Bits Per Character (BPC)2 | 13 | |
| Language Modeling | WikiText-2 vLLM harness (test) | Perplexity (PPL)8.87 | 12 | |
| Language Modeling | Wikitext zero-shot | Perplexity25.75 | 12 | |
| Privacy Measurement | WikiText | Epsilon0 | 12 | |
| Open-ended Text Generation | Wikitext (test) | Diversity (DIV)95 | 12 | |
| Language Modeling | WikiText-103 20w x 2048 | Perplexity (PPL)9.603 | 10 | |
| Prefilling Profiling | WikiText (test) | Time (s)38 | 10 | |
| Language Modeling | WikiText 1,000-example evaluation slice (test) | Perplexity12.723 | 9 | |
| Language Modeling | WikiText zero-shot transfer (test) | Perplexity33.22 | 8 | |
| Language Modeling | WikiText (test) | ROUGE Score64.14 | 8 | |
| Language Modeling | WikiText 1K | Perplexity13.8 | 7 | |
| Language Modeling | wikitext | Perplexity (word)8.0668 | 6 | |
| Model Compression Time | WikiText | Compression Time (s)196.34 | 6 |