| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | Penn Treebank (test) | Perplexity46.34 | 411 | |
| Language Modeling | Penn Treebank (val) | Perplexity46.64 | 178 | |
| Language Modeling | Penn Treebank (PTB) (test) | Perplexity14.72 | 120 | |
| Character-level Language Modeling | Penn Treebank (test) | BPC1.175 | 113 | |
| Dependency Parsing | Penn Treebank (PTB) (test) | LAS96.4 | 80 | |
| Language Modeling | Penn Treebank word-level (test) | Perplexity49.95 | 72 | |
| Language Modeling | Penn Treebank (PTB) (val) | Perplexity46.63 | 70 | |
| Part-of-Speech Tagging | Penn TreeBank (test) | Accuracy97.96 | 64 | |
| Constituency Parsing | Penn Treebank WSJ (section 23 test) | F1 Score95.8 | 55 | |
| Character-level Language Modeling | Penn Treebank char-level (test) | BPC1.16 | 25 | |
| Unlabeled Parsing | Penn Treebank WSJ (test) | F1 (mean)84.3 | 25 | |
| Dependency Parsing | Penn Treebank (PTB) Section 23 v2.2 (test) | UAS95.66 | 17 | |
| Unsupervised Constituency Parsing | Penn TreeBank English (test) | Mean S-F169.6 | 16 | |
| Unsupervised Parsing | Penn Treebank WSJ Section 23 (test) | F1 Score57.22 | 15 | |
| POS Tagging | Penn Treebank (PTB) Section 23 v2.2 (test) | POS Accuracy97.97 | 15 | |
| Constituency Parsing | Penn Treebank shortest 25% of samples < 128 tokens (test) | Bracket Precision76.6 | 14 | |
| Language Modeling | Penn Treebank (dev) | Perplexity (PPL)56.5 | 14 | |
| Unlabeled Parsing | Penn Treebank WSJ10 (test) | F1 (max)82.9 | 14 | |
| Syntactic Parsing | English Penn Treebank (test) | Speed (Sents/s)1,127 | 11 | |
| Language Modeling | Penn Treebank (PTB) word-level (val) | Perplexity56.5 | 11 | |
| Word Ordering | Penn Treebank (test) | BLEU34.5 | 11 | |
| Part-of-speech tagging | Penn Treebank POS (test) | F1 Score97.58 | 10 | |
| Character-level language modeling | Penn Treebank character-level (val) | BPC1.24 | 10 | |
| Constituency Parsing | Penn Treebank WSJ section 22 (dev) | F1 Score93.5 | 9 | |
| 5-way few-shot classification | Penn Treebank v1 (test) | 1-shot Accuracy (random)72.8 | 8 |