Soft Layer-Specific Multi-Task Summarization with Entailment and Question Generation
About
An accurate abstractive summary of a document should contain all its salient information and should be logically entailed by the input document. We improve these important aspects of abstractive summarization via multi-task learning with the auxiliary tasks of question generation and entailment generation, where the former teaches the summarization model how to look for salient questioning-worthy details, and the latter teaches the model how to rewrite a summary which is a directed-logical subset of the input document. We also propose novel multi-task architectures with high-level (semantic) layer-specific sharing across multiple encoder and decoder layers of the three tasks, as well as soft-sharing mechanisms (and show performance ablations and analysis examples of each contribution). Overall, we achieve statistically significant improvements over the state-of-the-art on both the CNN/DailyMail and Gigaword datasets, as well as on the DUC-2002 transfer setup. We also present several quantitative and qualitative analysis studies of our model's learned saliency and entailment skills.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Summarization | CNN/Daily Mail original, non-anonymized (test) | ROUGE-139.81 | 54 | |
| Abstractive Summarization | CNN/DailyMail full length F-1 (test) | ROUGE-139.84 | 48 | |
| Summarization | Gigaword | ROUGE-L33.63 | 38 | |
| Summarization | Gigaword (test) | ROUGE-217.76 | 38 | |
| Question Generation | SQuAD (test) | -- | 22 | |
| Summarization | DUC 2002 (test) | ROUGE-136.73 | 18 | |
| Abstractive Text Summarization | Gigaword | ROUGE-135.98 | 14 | |
| Summarization | CNN/DailyMail human evaluation (100 samples) | Relevance Score43 | 6 | |
| Entailment Generation | SNLI (test) | METEOR32.4 | 3 | |
| Entailment Classification | CNN/DailyMail (test) | Avg Entailment Probability91.2 | 2 |