| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language modelling | LM1B (test) | Perplexity20.86 | 151 | |
| Language Modeling | LM1B | PPL (Generalized)48.5 | 93 | |
| Text Generation | LM1B (test) | Entropy2.46 | 85 | |
| Language Modeling | LM1B (val) | Perplexity16.57 | 67 | |
| Unconditional generation | LM1B sequence length 128 | Generation Perplexity (PPL)40.2 | 43 | |
| Language Modeling | LM1B | Perplexity22.8 | 39 | |
| Unconditional generation | LM1B | Generation Perplexity36.42 | 31 | |
| Unconditional Text Generation | LM1B | Entropy4.29 | 24 | |
| Text Generation | LM1B | Perplexity (PPL)68.11 | 24 | |
| Hyperparameter Optimization | PD1-LM1B (val) | Validation Error0.628 | 24 | |
| Language Generation | LM1B 1024 sequences of length 128 | Generative PPL186.79 | 20 | |
| Language Modeling | LM1B zero-shot | Perplexity51.25 | 20 | |
| Language Modeling | LM1B L=128 (test) | NELBO PPL24.53 | 17 | |
| Language Modeling | LM1B (test) | Block Efficiency8.94 | 15 | |
| Speculative Decoding | LM1B (test) | BE7.88 | 10 | |
| Language Modeling | LM1B GPT-2 small model size equivalent (test) | Perplexity20.53 | 10 | |
| Autoregressive Language Modeling | LM1B | PPL21.5 | 7 | |
| Language Modeling | LM1B GPT2 | PPL65.629 | 4 | |
| Language Modeling | LM1B ctx len. 128 (val) | PPL (Val)25.72 | 3 | |
| Text Generation | LM1B (val) | Perplexity51.25 | 1 | |
| Auto-regressive language modeling | LM1B 1.0 (test) | Metric- | 0 |