SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization
About
This paper introduces the SAMSum Corpus, a new dataset with abstractive dialogue summaries. We investigate the challenges it poses for automated summarization by testing several models and comparing their results with those obtained on a corpus of news articles. We show that model-generated summaries of dialogues achieve higher ROUGE scores than the model-generated summaries of news -- in contrast with human evaluators' judgement. This suggests that a challenging task of abstractive dialogue summarization requires dedicated models and non-standard quality measures. To our knowledge, our study is the first attempt to introduce a high-quality chat-dialogues corpus, manually annotated with abstractive summarizations, which can be used by the research community for further studies.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Dialogue Summarization | SamSum (test) | ROUGE-220.65 | 80 | |
| Abstractive Summarization | SamSum | ROUGE-225.6 | 73 | |
| Summarization | EMAILSUM short 1.0 (test) | R135.57 | 19 | |
| Summarization | EMAILSUM long 1.0 (test) | ROUGE-1 (R1)42.96 | 19 | |
| Generative Task | Wiki Auto | BLEU Score0.3069 | 4 | |
| Generative Task | emdg | BS51.36 | 4 | |
| Generative Task | esnli | BS Score58.28 | 4 | |
| Generative Task | haiku | ROUGE Score25.25 | 4 | |
| Generative Task | covid qa | BS49.93 | 4 | |
| Generative Task | ELI5 | BS47.94 | 4 |