Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Learning Language-guided Adaptive Hyper-modality Representation for Multimodal Sentiment Analysis

About

Though Multimodal Sentiment Analysis (MSA) proves effective by utilizing rich information from multiple sources (e.g., language, video, and audio), the potential sentiment-irrelevant and conflicting information across modalities may hinder the performance from being further improved. To alleviate this, we present Adaptive Language-guided Multimodal Transformer (ALMT), which incorporates an Adaptive Hyper-modality Learning (AHL) module to learn an irrelevance/conflict-suppressing representation from visual and audio features under the guidance of language features at different scales. With the obtained hyper-modality representation, the model can obtain a complementary and joint representation through multimodal fusion for effective MSA. In practice, ALMT achieves state-of-the-art performance on several popular datasets (e.g., MOSI, MOSEI and CH-SIMS) and an abundance of ablation demonstrates the validity and necessity of our irrelevance/conflict suppression mechanism.

Haoyu Zhang, Yu Wang, Guanghao Yin, Kejun Liu, Yuanyuan Liu, Tianshu Yu• 2023

Related benchmarks

TaskDatasetResultRank
Multimodal Sentiment AnalysisMOSEI
MAE0.55
168
Multimodal Sentiment AnalysisCMU-MOSI--
144
Multimodal Sentiment AnalysisMOSI
MAE0.721
132
Multimodal Sentiment AnalysisCH-SIMS (test)
F1 Score81.57
108
Multimodal Sentiment AnalysisSIMS (test)
Accuracy (2-Class)81.91
78
Multimodal Sentiment AnalysisMOSEI (test)
MAE0.526
49
Multimodal Sentiment AnalysisMOSI (test)
MAE0.683
34
Multimodal Sentiment AnalysisCH-SIMS
F1 Score77.6
32
Multimodal Sentiment AnalysisSIMS V2
Accuracy (2-class)79.59
17
Multimodal Sentiment AnalysisSIMS
MAE0.408
10
Showing 10 of 10 rows

Other info

Follow for update