PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization
About
We introduce PRIMERA, a pre-trained model for multi-document representation with a focus on summarization that reduces the need for dataset-specific architectures and large amounts of fine-tuning labeled data. PRIMERA uses our newly proposed pre-training objective designed to teach the model to connect and aggregate information across documents. It also uses efficient encoder-decoder transformers to simplify the processing of concatenated input documents. With extensive experiments on 6 multi-document summarization datasets from 3 different domains on zero-shot, few-shot and full-supervised settings, PRIMERA outperforms current state-of-the-art dataset-specific and pre-trained models on most of these settings with large margins. The code and pre-trained models can be found at \url{https://github.com/allenai/PRIMER}.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Summarization | arXiv (test) | ROUGE-147.6 | 161 | |
| Text Summarization | DUC 2004 (test) | ROUGE-135.1 | 115 | |
| Summarization | arXiv | ROUGE-220.8 | 76 | |
| Document Summarization | GovReport (test) | ROUGE-155.1 | 50 | |
| Abstractive Summarization | Multi-News | ROUGE-221.1 | 47 | |
| Multi-document summarization | Multi-News (test) | ROUGE-221.1 | 45 | |
| Multi-document summarization | WCEP (test) | R-146.08 | 27 | |
| Long document summarization | arXiv (test) | ROUGE-2 Score20.8 | 24 | |
| Summarization | SummScreen (test) | ROUGE-132.3 | 17 | |
| Multi-document summarization | WikiSUM (test) | ROUGE-128 | 14 |