Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Revisiting Cross-Lingual Summarization: A Corpus-based Study and A New Benchmark with Improved Annotation

About

Most existing cross-lingual summarization (CLS) work constructs CLS corpora by simply and directly translating pre-annotated summaries from one language to another, which can contain errors from both summarization and translation processes. To address this issue, we propose ConvSumX, a cross-lingual conversation summarization benchmark, through a new annotation schema that explicitly considers source input context. ConvSumX consists of 2 sub-tasks under different real-world scenarios, with each covering 3 language directions. We conduct thorough analysis on ConvSumX and 3 widely-used manually annotated CLS corpora and empirically find that ConvSumX is more faithful towards input text. Additionally, based on the same intuition, we propose a 2-Step method, which takes both conversation and summary as input to simulate human annotation process. Experimental results show that 2-Step method surpasses strong baselines on ConvSumX under both automatic and human evaluation. Analysis shows that both source input text and summary are crucial for modeling cross-lingual summaries.

Yulong Chen, Huajian Zhang, Yijie Zhou, Xuefeng Bai, Yueguan Wang, Ming Zhong, Jianhao Yan, Yafu Li, Judy Li, Michael Zhu, Yue Zhang• 2023

Related benchmarks

TaskDatasetResultRank
Cross-lingual SummarizationEn2ZhSum (test)
ROUGE-146.87
31
Cross-lingual SummarizationXMediaSum40K En2Zh (test)
ROUGE-1 Score29.7
7
Cross-lingual SummarizationXMediaSum40K En2De (test)
ROUGE-127.4
7
Cross-lingual SummarizationConvSumX QX 1.0 (test)
Fluency Score3.35
6
Cross-lingual SummarizationConvSumX DX 1.0 (test)
Fluency3.83
6
Cross-lingual SummarizationXSAMSum En2Zh (test)
ROUGE-143.5
6
Cross-lingual SummarizationXSAMSum En2De (test)
ROUGE-146.2
6
Cross-lingual SummarizationDialogSumX En2Fr (test)
ROUGE-146.19
4
Cross-lingual SummarizationQMSumX En2Zh (test)
ROUGE-133.2
4
Cross-lingual SummarizationQMSumX En2Fr (test)
ROUGE-138.91
4
Showing 10 of 12 rows

Other info

Code

Follow for update