| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | LAMBADA | Accuracy86.9 | 268 | |
| Language Modeling | LAMBADA | Perplexity3.1 | 150 | |
| Word Prediction | LAMBADA | Accuracy86.9 | 148 | |
| Language Modeling | Lambada OpenAI | Accuracy70.83 | 127 | |
| Language Modeling | LAMBADA | Accuracy79.4 | 76 | |
| Language Modeling | LAMBADA (test) | Accuracy88.61 | 71 | |
| Word Prediction | LAMBADA (test) | Accuracy87.15 | 53 | |
| Language Modeling | LAMBADA zero-shot (test) | Accuracy (zero-shot)69.12 | 44 | |
| Language Modeling | Lambada Standard | Accuracy60.8 | 36 | |
| Language Modeling | LAMBADA standard (LS) | Accuracy (LAMBADA)65.57 | 30 | |
| Word Prediction | LAMBADA OpenAI | Accuracy71.4 | 26 | |
| Language Modeling | LAMBADA | Delta (%)43.3 | 25 | |
| Language Modeling | Lambada (val) | Perplexity12.37 | 24 | |
| Reading Comprehension | Lambada | Accuracy80.5 | 24 | |
| Language Modeling | LAMBADA multilingual (test) | LAMBADA Score (DE)140.97 | 20 | |
| Word Prediction | LAMBADA standard | Accuracy65.57 | 20 | |
| Cloze-style completion | Lambada OpenAI | Accuracy75.65 | 20 | |
| Word Prediction | LAMBADA CONTROL (all) | Accuracy36 | 20 | |
| Language Modeling | LAMBADA (dev) | Perplexity12.34 | 20 | |
| Language Modeling | Lambada | EM Accuracy89.7 | 18 | |
| Language Modeling | Lambada (OpenAI split) | PPL3.11 | 13 | |
| Reading Comprehension | LAMBADA (test) | Accuracy66.51 | 13 | |
| Word Prediction | LAMBADA CONTROL (context) | Accuracy65.6 | 13 | |
| Language Modeling | LAMBADA (control) | Perplexity94 | 12 | |
| Word Prediction | Lambada | Accuracy (Original)82.2 | 11 |