Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

About

This paper introduces the SAMSum Corpus, a new dataset with abstractive dialogue summaries. We investigate the challenges it poses for automated summarization by testing several models and comparing their results with those obtained on a corpus of news articles. We show that model-generated summaries of dialogues achieve higher ROUGE scores than the model-generated summaries of news -- in contrast with human evaluators' judgement. This suggests that a challenging task of abstractive dialogue summarization requires dedicated models and non-standard quality measures. To our knowledge, our study is the first attempt to introduce a high-quality chat-dialogues corpus, manually annotated with abstractive summarizations, which can be used by the research community for further studies.

Bogdan Gliwa, Iwona Mochol, Maciej Biesek, Aleksander Wawer• 2019

Related benchmarks

TaskDatasetResultRank
Dialogue SummarizationSamSum (test)
ROUGE-220.65
80
Abstractive SummarizationSamSum
ROUGE-225.6
73
SummarizationEMAILSUM short 1.0 (test)
R135.57
19
SummarizationEMAILSUM long 1.0 (test)
ROUGE-1 (R1)42.96
19
Generative TaskWiki Auto
BLEU Score0.3069
4
Generative Taskemdg
BS51.36
4
Generative Taskesnli
BS Score58.28
4
Generative Taskhaiku
ROUGE Score25.25
4
Generative Taskcovid qa
BS49.93
4
Generative TaskELI5
BS47.94
4
Showing 10 of 12 rows

Other info

Follow for update