| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | WikiText-2 (test) | PPL2.56 | 1,949 | |
| Language Modeling | WikiText | PPL0.2838 | 732 | |
| Language Modeling | WikiText (test) | Perplexity5.49 | 62 | |
| Language Modeling | WikiText (val) | Perplexity12.51 | 54 | |
| Language Modeling | Wikitext | Wikitext PPL12.85 | 45 | |
| Language Modeling | WikiText (held-out) | Perplexity (PPL)9.8 | 25 | |
| Language Modeling | WikiText-103 | Throughput (tokens/s)159,000 | 21 | |
| Language Modeling | WikiText v1 (test) | Perplexity13.33 | 18 | |
| Language Modeling | WikiText (WT) | Relative PPL Change (%)31 | 16 | |
| Language Modeling | WikiText | PPL Change (%)1.7 | 16 | |
| Language Modeling | WikiText-2 vLLM harness (test) | Perplexity (PPL)8.87 | 12 | |
| Privacy Measurement | WikiText | Epsilon0 | 12 | |
| Open-ended Text Generation | Wikitext (test) | Diversity (DIV)95 | 12 | |
| Prefilling Profiling | WikiText (test) | Time (s)38 | 10 | |
| Language Modeling | Wikitext zero-shot | Perplexity25.75 | 10 | |
| Language Modeling | WikiText (test) | ROUGE Score64.14 | 8 | |
| Language Modeling | WikiText 1K | Perplexity13.8 | 7 | |
| Model Compression Time | WikiText | Compression Time (s)196.34 | 6 | |
| Membership Inference Attack | WikiText | TPR @ 0.1% FPR14 | 6 | |
| Knowledge Evaluation | WikiText (eval) | BPB0.777 | 6 | |
| Autoregressive Language Modeling | WikiText-103 (first 10M tokens) | Perplexity (PPL)90.5 | 5 | |
| Language Modeling | WikiText byte-level | Wikitext PPL1.5798 | 5 | |
| Masked Reconstruction | WikiText-103 | PPL4.94 | 5 | |
| Text Generation | Wikitext | Coherence: CD Better Rate88.7 | 4 | |
| Language Modeling | Wikitext zero-shot | Gap Closed (%)40.8 | 3 |