| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | OpenWebText2 (test) | Perplexity16.2 | 104 | |
| Unconditional Text Generation | OpenWebText | Gen. PPL1.21 | 100 | |
| Language Modeling | OpenWebText | Perplexity11 | 91 | |
| Text Generation | OpenWebText | Perplexity3.18 | 86 | |
| Language Modeling | OpenWebText (val) | Validation Loss2.6091 | 80 | |
| Unconditional generation | OpenWebText (OWT) L=1024 (held-out) | MAUVE1 | 45 | |
| Language Modeling | OpenWebText (OWT) (val) | Perplexity7.77 | 42 | |
| Next Token Prediction | OpenWebText | PPL18.68 | 30 | |
| Next Token Prediction | OpenWebText (held-out) | ID PPL18.53 | 30 | |
| Clustering | OpenWebText | Clustering Score0.6222 | 30 | |
| Sentiment Steering | OpenWebText Neutral to Negative (test) | Perplexity (PPL)12.48 | 27 | |
| Sentiment Steering | OpenWebText Neutral to Positive (test) | Perplexity (PPL)12.48 | 27 | |
| Language Modeling | NanoGPT OpenWebText | Throughput (tokens/s)391,100 | 24 | |
| Unconditional Text Generation | OpenWebText (test) | LLAMA2 Score692.3 | 21 | |
| Language Modeling | OpenWebText (train) | Train Loss2.5243 | 21 | |
| Embedding Space Analysis | OpenWebText | Iso0.98 | 18 | |
| Language Modeling | OpenWebText (test) | Loss2.65 | 18 | |
| Language Modeling | OpenWebText standard (test) | Perplexity20.08 | 17 | |
| Language Modeling | OpenWebText (held-out set) | PPL11.5 | 16 | |
| Language Modeling | OpenWebText GPT-2 (test) | Perplexity17.94 | 13 | |
| Unconditional generation | OpenWebText L=2048 (test) | Gen. PPL13.2 | 12 | |
| Unconditional generation | OpenWebText L=1024 (test) | Generation Perplexity14.1 | 12 | |
| Language Modeling | OpenWebText2 (val) | Perplexity17.12 | 12 | |
| Sentiment Steering | OpenWebText Negative prompts (test) | Positivity Score0.59 | 12 | |
| Text Generation | OpenWebText (OWT) GPT-2 tokenizer (val) | PPL15.36 | 12 |