Multimodal Sentiment Analysis with Missing Modality: A Knowledge-Transfer Approach
About
Multimodal sentiment analysis aims to identify the emotions expressed by individuals through visual, language, and acoustic cues. However, most existing research assume that all modalities are available during both training and testing, which makes their algorithms susceptible to the missing-modality scenarios. In this paper, we propose a novel knowledge-transfer network to translate between different modalities to reconstruct the missing audio features. Moreover, we develop a cross-modality attention mechanism to maximize the information extracted from the reconstructed and observed modalities for sentiment prediction. Extensive experiments on three publicly available datasets demonstrate significant improvements over baseline methods and achieve comparable results to the previous methods with complete multi-modality supervision.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multimodal Sentiment Analysis | CMU-MOSI Word Aligned (test) | Accuracy (7-Class)39.7 | 30 | |
| Multimodal Sentiment Analysis | CMU-MOSEI Word Aligned (test) | Accuracy (7-Class)51.1 | 22 | |
| Multimodal Sentiment Analysis | CMU-MOSI Unaligned (test) | Accuracy (7-Class)39.3 | 20 | |
| Multimodal Emotion Recognition | IEMOCAP Word Aligned (test) | Happy Accuracy90.3 | 16 | |
| Multimodal Emotion Recognition | IEMOCAP Unaligned (test) | Happy Accuracy84.8 | 12 | |
| Multimodal Sentiment Analysis | CMU-MOSEI unaligned | Accuracy (7-class)49.7 | 7 |