Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Summary-Oriented Vision Modeling for Multimodal Abstractive Summarization

About

Multimodal abstractive summarization (MAS) aims to produce a concise summary given the multimodal data (text and vision). Existing studies mainly focus on how to effectively use the visual features from the perspective of an article, having achieved impressive success on the high-resource English dataset. However, less attention has been paid to the visual features from the perspective of the summary, which may limit the model performance, especially in the low- and zero-resource scenarios. In this paper, we propose to improve the summary quality through summary-oriented visual features. To this end, we devise two auxiliary tasks including vision to summary task and masked image modeling task. Together with the main summarization task, we optimize the MAS model via the training objectives of all these tasks. By these means, the MAS model can be enhanced by capturing the summary-oriented visual features, thereby yielding more accurate summaries. Experiments on 44 languages, covering mid-high-, low-, and zero-resource scenarios, verify the effectiveness and superiority of the proposed approach, which achieves state-of-the-art performance under all scenarios. Additionally, we will contribute a large-scale multilingual multimodal abstractive summarization (MM-Sum) dataset.

Yunlong Liang, Fandong Meng, Jinan Xu, Jiaan Wang, Yufeng Chen, Jie Zhou• 2022

Related benchmarks

TaskDatasetResultRank
Multimodal SummarizationMM-Sum low-resource 1.0
ROUGE-147.96
96
Multimodal SummarizationMM-Sum Zero-Resource Languages (test)
ROUGE-1 Score32.97
96
Multimodal Abstractive SummarizationMM-Sum mid-high-resource (test)
ROUGE-141.59
90
Multimodal Abstractive SummarizationMM-Sum mid-high-resource
ROUGE-141.51
90
Multimodal Abstractive SummarizationHow2 (test)
ROUGE-167.7
13
Multimodal Abstractive SummarizationMM-Sum Bengali low-resource
ROUGE-128.58
6
Multimodal Abstractive SummarizationMM-Sum French, low-resource
ROUGE-135.93
6
Multimodal Abstractive SummarizationMM-Sum Gujarati, low-resource
ROUGE-122.18
6
Multimodal Abstractive SummarizationMM-Sum Hausa low-resource
ROUGE-139.28
6
Multimodal Abstractive SummarizationMM-Sum Japanese low-resource
ROUGE-1 Score47.79
6
Showing 10 of 23 rows

Other info

Code

Follow for update