| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Machine Comprehension | CNN (val) | Accuracy0.779 | 80 | |
| Machine Comprehension | CNN (test) | Accuracy77.9 | 77 | |
| Question Answering | CNN (test) | Accuracy78.6 | 24 | |
| Summarization | CNN | AlignScore86.4 | 19 | |
| Summarization | CNN | ROUGE-1 F-Score42.2 | 18 | |
| Summarization | CNN | BERTScore F87.8 | 18 | |
| Summarization | CNN | ROUGE-142.2 | 18 | |
| Summarization | CNN | ROUGE-2 F-Score17.3 | 18 | |
| Summarization | CNN out-of-domain (test) | D320 | 16 | |
| Machine Reading Comprehension | CNN (dev) | Accuracy77.2 | 13 | |
| Summarization | CNN 3.0.0 | ROUGE-L22.46 | 12 | |
| Abstractive Summarization | CNN (test) | ROUGE-131.9 | 12 | |
| Question Answering | CNN (val) | Accuracy79.2 | 8 | |
| Classification | cnn | Accuracy94 | 8 | |
| Multimodal Summarization | CNN | R-1 Score30.82 | 7 | |
| Global Optimization | CNN D=4, T=256 | Stopping Time5 | 6 | |
| Summarization | CNN (test) | ROUGE-214.5 | 6 | |
| Keyphrase extraction | CNN (test) | Recall@150.336 | 6 | |
| Extractive Summarization | CNN (test) | ROUGE-130.8 | 5 | |
| Summarization | CNN non-anonymized (test) | ROUGE-130.4 | 5 | |
| Extractive Summarization | CNN full-length F1 (test) | ROUGE-130.7 | 4 | |
| Question Answering | CNN full task (test) | Error Rate0.363 | 3 | |
| Question Answering | CNN full task (val) | Error Rate37.9 | 3 |