| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | LAMBADA | Accuracy86.9 | 183 | |
| Word Prediction | LAMBADA | Accuracy86.9 | 112 | |
| Language Modeling | LAMBADA | Perplexity3.1 | 99 | |
| Language Modeling | LAMBADA (test) | Accuracy88.61 | 71 | |
| Language Modeling | Lambada OpenAI | Accuracy69.1 | 61 | |
| Word Prediction | LAMBADA (test) | Accuracy87.15 | 53 | |
| Language Modeling | LAMBADA zero-shot (test) | Accuracy (zero-shot)69.12 | 44 | |
| Cloze-style completion | Lambada OpenAI | Accuracy75.65 | 20 | |
| Word Prediction | LAMBADA CONTROL (all) | Accuracy36 | 20 | |
| Language Modeling | Lambada (OpenAI split) | PPL3.11 | 13 | |
| Reading Comprehension | LAMBADA (test) | Accuracy66.51 | 13 | |
| Word Prediction | LAMBADA CONTROL (context) | Accuracy65.6 | 13 | |
| Language Modeling | LAMBADA (control) | Perplexity94 | 12 | |
| Language Modeling | LAMBADA (dev) | Perplexity134 | 12 | |
| Word Prediction | Lambada | Accuracy (Original)82.2 | 11 | |
| Language Modeling | Lambada | EM Accuracy89.7 | 11 | |
| Reading Comprehension | Lambada | Accuracy80.5 | 10 | |
| Language Modeling | Lambada (val) | Perplexity50.05 | 8 | |
| Text coherence assessment | LAMBADA | Coherence Score53.79 | 8 | |
| Language Modeling | Lambada Standard | PPL14.42 | 7 | |
| Word Prediction | LAMBADA OpenAI | Accuracy49.31 | 6 | |
| Reading Comprehension | LAMBADA (dev) | Accuracy58.31 | 4 | |
| Reading Comprehension | LAMBADA (control) | Accuracy48.01 | 4 | |
| Cloze-style word prediction | LAMBADA medium model (test) | Accuracy61.3 | 3 | |
| Reading Comprehension | LAMBADA context (test) | Accuracy68.88 | 3 |