Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis

About

Representation Learning is a significant and challenging task in multimodal learning. Effective modality representations should contain two parts of characteristics: the consistency and the difference. Due to the unified multimodal annotation, existing methods are restricted in capturing differentiated information. However, additional uni-modal annotations are high time- and labor-cost. In this paper, we design a label generation module based on the self-supervised learning strategy to acquire independent unimodal supervisions. Then, joint training the multi-modal and uni-modal tasks to learn the consistency and difference, respectively. Moreover, during the training stage, we design a weight-adjustment strategy to balance the learning progress among different subtasks. That is to guide the subtasks to focus on samples with a larger difference between modality supervisions. Last, we conduct extensive experiments on three public multimodal baseline datasets. The experimental results validate the reliability and stability of auto-generated unimodal supervisions. On MOSI and MOSEI datasets, our method surpasses the current state-of-the-art methods. On the SIMS dataset, our method achieves comparable performance than human-annotated unimodal labels. The full codes are available at https://github.com/thuiar/Self-MM.

Wenmeng Yu, Hua Xu, Ziqi Yuan, Jiele Wu• 2021

Related benchmarks

TaskDatasetResultRank
Multimodal Sentiment AnalysisCMU-MOSI (test)
F185.95
238
Multimodal Sentiment AnalysisCMU-MOSEI (test)
F1 Score85.2
206
Multimodal Sentiment AnalysisCMU-MOSI
MAE0.712
59
Multimodal Sentiment AnalysisMOSEI (test)
MAE0.529
49
Emotion RecognitionIEMOCAP (test)
Score (l)0.687
36
Multimodal Sentiment AnalysisMOSI (test)
MAE0.712
34
Multimodal Sentiment AnalysisCH-SIMS V2
Accuracy (2-Class)78.7
29
Emotion Recognition (ER) Valence and Arousal RegressionEMER (test)
Arousal MAE0.244
26
Multimodal Sentiment AnalysisSIMS (test)
MAE0.458
22
Multimodal Sentiment AnalysisCMU-MOSEI segments (test)
ACC285.3
22
Showing 10 of 25 rows

Other info

Follow for update