Multi-XScience: A Large-scale Dataset for Extreme Multi-document Summarization of Scientific Articles
About
Multi-document summarization is a challenging task for which there exists little large-scale datasets. We propose Multi-XScience, a large-scale multi-document summarization dataset created from scientific articles. Multi-XScience introduces a challenging multi-document summarization task: writing the related-work section of a paper based on its abstract and the articles it references. Our work is inspired by extreme summarization, a dataset construction protocol that favours abstractive modeling approaches. Descriptive statistics and empirical results---using several state-of-the-art models trained on the Multi-XScience dataset---reveal that Multi-XScience is well suited for abstractive models.
Yao Lu, Yue Dong, Laurent Charlin• 2020
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text Summarization | DUC 2004 (test) | ROUGE-124.1 | 115 | |
| Summarization | arXiv | ROUGE-210.78 | 76 | |
| Abstractive Summarization | Multi-News | ROUGE-214.77 | 47 | |
| Multi-document summarization | Multi-News (test) | ROUGE-26.2 | 45 | |
| Multi-document summarization | WCEP (test) | R-120.2 | 27 | |
| Multi-document summarization | WikiSUM (test) | ROUGE-121.6 | 14 | |
| Summarization | Multi-XScience | R-131.17 | 12 | |
| Summarization | Wikisum | ROUGE-1 Score32.97 | 12 | |
| Summarization | WCEP | ROUGE-141.34 | 12 | |
| Multi-document summarization | Multi-XSci (test) | ROUGE-134.11 | 11 |
Showing 10 of 12 rows