UniSumm and SummZoo: Unified Model and Diverse Benchmark for Few-Shot Summarization

About

The high annotation costs and diverse demands of various summarization tasks motivate the development of few-shot summarization. However, despite the emergence of many summarization tasks and datasets, the current training paradigm for few-shot summarization systems ignores potentially shareable knowledge in heterogeneous datasets. To this end, we propose \textsc{UniSumm}, a unified few-shot summarization model pre-trained with multiple summarization tasks and can be prefix-tuned to excel at any few-shot summarization task. Meanwhile, to better evaluate few-shot summarizers, under the principles of diversity and robustness, we assemble and release a new benchmark \textsc{SummZoo}. It consists of $8$ summarization tasks with multiple sets of few-shot samples for each task, covering diverse domains. Experimental results and analysis show that \textsc{UniSumm} outperforms strong baselines by a large margin across all sub-tasks in \textsc{SummZoo} under both automatic and human evaluations and achieves comparable results in human evaluation compared with a GPT-3.5 model.

Yulong Chen, Yang Liu, Ruochen Xu, Ziyi Yang, Chenguang Zhu, Michael Zeng, Yue Zhang• 2022

Related benchmarks

Task	Dataset	Result
Summarization	Xsum	ROUGE-211.36	108
Summarization	arXiv	ROUGE-216.42	76
Abstractive Summarization	SamSum	ROUGE-220.65	73
Abstractive Summarization	Multi-News	ROUGE-215.86	47
Summarization	SamSum	--	30
Summarization	DIALOGSUM	ROUGE-L33.74	27
Abstractive Summarization	WikiHow	ROUGE-211.73	26
Summarization	MultiNews (test)	--	24
Summarization	WikiHow (test)	--	12
Summarization	SUMMZOO Average	ROUGE-213.97	11

Showing 10 of 17 rows

Other info

Code

Follow for update

@wizwand_team Discord