| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | LAMBADA | Accuracy86.9 | 412 | |
| Language Modeling | LAMBADA | Perplexity3.1 | 198 | |
| Word Prediction | LAMBADA | Accuracy86.9 | 192 | |
| Language Modeling | Lambada OpenAI | Accuracy70.83 | 127 | |
| Language Modeling | LAMBADA (test) | Perplexity4 | 109 | |
| Language Modeling | LAMBADA | Accuracy79.4 | 103 | |
| Language Modeling | Lambada | Perplexity (Lambada)14.36 | 70 | |
| Word Prediction | LAMBADA (test) | Accuracy87.15 | 53 | |
| Language Modeling | LAMBADA zero-shot (test) | Accuracy (zero-shot)69.12 | 44 | |
| Language Modeling | LAMBADA | PPL Change (%)0.2 | 41 | |
| Language Modeling | Lambada (val) | Perplexity10.14 | 39 | |
| Language Modeling | Lambada Standard | Accuracy60.8 | 36 | |
| Language Modeling | LAMBADA standard (LS) | Accuracy (LAMBADA)65.57 | 30 | |
| Word Prediction | LAMBADA OpenAI | Accuracy71.4 | 29 | |
| Language Modeling | LAMBADA | Delta (%)43.3 | 25 | |
| Reading Comprehension | Lambada | Accuracy80.5 | 24 | |
| Language Modeling | Lambada (OpenAI split) | PPL3.11 | 22 | |
| Language Modeling | LAMBADA multilingual (test) | LAMBADA Score (DE)140.97 | 20 | |
| Word Prediction | LAMBADA standard | Accuracy65.57 | 20 | |
| Cloze-style completion | Lambada OpenAI | Accuracy75.65 | 20 | |
| Word Prediction | LAMBADA CONTROL (all) | Accuracy36 | 20 | |
| Language Modeling | LAMBADA (dev) | Perplexity12.34 | 20 | |
| Language Modeling | Lambada | EM Accuracy89.7 | 18 | |
| Language Modeling | LAMBADA | PPL11.39 | 14 | |
| Question Answering | LAMBADA | Accuracy73.2 | 14 |