Revisiting Cross-Lingual Summarization: A Corpus-based Study and A New Benchmark with Improved Annotation
About
Most existing cross-lingual summarization (CLS) work constructs CLS corpora by simply and directly translating pre-annotated summaries from one language to another, which can contain errors from both summarization and translation processes. To address this issue, we propose ConvSumX, a cross-lingual conversation summarization benchmark, through a new annotation schema that explicitly considers source input context. ConvSumX consists of 2 sub-tasks under different real-world scenarios, with each covering 3 language directions. We conduct thorough analysis on ConvSumX and 3 widely-used manually annotated CLS corpora and empirically find that ConvSumX is more faithful towards input text. Additionally, based on the same intuition, we propose a 2-Step method, which takes both conversation and summary as input to simulate human annotation process. Experimental results show that 2-Step method surpasses strong baselines on ConvSumX under both automatic and human evaluation. Analysis shows that both source input text and summary are crucial for modeling cross-lingual summaries.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Cross-lingual Summarization | En2ZhSum (test) | ROUGE-146.87 | 31 | |
| Cross-lingual Summarization | XMediaSum40K En2Zh (test) | ROUGE-1 Score29.7 | 7 | |
| Cross-lingual Summarization | XMediaSum40K En2De (test) | ROUGE-127.4 | 7 | |
| Cross-lingual Summarization | ConvSumX QX 1.0 (test) | Fluency Score3.35 | 6 | |
| Cross-lingual Summarization | ConvSumX DX 1.0 (test) | Fluency3.83 | 6 | |
| Cross-lingual Summarization | XSAMSum En2Zh (test) | ROUGE-143.5 | 6 | |
| Cross-lingual Summarization | XSAMSum En2De (test) | ROUGE-146.2 | 6 | |
| Cross-lingual Summarization | DialogSumX En2Fr (test) | ROUGE-146.19 | 4 | |
| Cross-lingual Summarization | QMSumX En2Zh (test) | ROUGE-133.2 | 4 | |
| Cross-lingual Summarization | QMSumX En2Fr (test) | ROUGE-138.91 | 4 |