| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Machine Comprehension | CNN (val) | Accuracy0.779 | 80 | |
| Machine Comprehension | CNN (test) | Accuracy77.9 | 77 | |
| Selective Generation | CNN | ROC-AUC75.8 | 66 | |
| Summarization | CNN | PRR0.444 | 44 | |
| Question Answering | CNN (test) | Accuracy78.6 | 24 | |
| Summarization | CNN | AlignScore86.4 | 19 | |
| Summarization | CNN | ROUGE-1 F-Score42.2 | 18 | |
| Summarization | CNN | BERTScore F87.8 | 18 | |
| Summarization | CNN | ROUGE-142.2 | 18 | |
| Summarization | CNN | ROUGE-2 F-Score17.3 | 18 | |
| Summarization | CNN out-of-domain (test) | D320 | 16 | |
| Selective Generation | CNN | PRR (ROUGE-L)0.15 | 14 | |
| Machine Reading Comprehension | CNN (dev) | Accuracy77.2 | 13 | |
| Summarization | CNN 3.0.0 | ROUGE-L22.46 | 12 | |
| Abstractive Summarization | CNN (test) | ROUGE-131.9 | 12 | |
| Fact-checking | CNN | Balanced Accuracy62.1 | 10 | |
| Question Answering | CNN (val) | Accuracy79.2 | 8 | |
| Classification | cnn | Accuracy94 | 8 | |
| Bayesian Optimization | CNN hyperparameter tuning | Mean Stopping Iteration6 | 7 | |
| Multimodal Summarization | CNN | R-1 Score30.82 | 7 | |
| Global Optimization | CNN D=4, T=256 | Stopping Time5 | 6 | |
| Summarization | CNN (test) | ROUGE-214.5 | 6 | |
| Keyphrase extraction | CNN (test) | Recall@150.336 | 6 | |
| Extractive Summarization | CNN (test) | ROUGE-130.8 | 5 | |
| Summarization | CNN non-anonymized (test) | ROUGE-130.4 | 5 |