Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning Language-guided Adaptive Hyper-modality Representation for Multimodal Sentiment Analysis

About

Though Multimodal Sentiment Analysis (MSA) proves effective by utilizing rich information from multiple sources (e.g., language, video, and audio), the potential sentiment-irrelevant and conflicting information across modalities may hinder the performance from being further improved. To alleviate this, we present Adaptive Language-guided Multimodal Transformer (ALMT), which incorporates an Adaptive Hyper-modality Learning (AHL) module to learn an irrelevance/conflict-suppressing representation from visual and audio features under the guidance of language features at different scales. With the obtained hyper-modality representation, the model can obtain a complementary and joint representation through multimodal fusion for effective MSA. In practice, ALMT achieves state-of-the-art performance on several popular datasets (e.g., MOSI, MOSEI and CH-SIMS) and an abundance of ablation demonstrates the validity and necessity of our irrelevance/conflict suppression mechanism.

Haoyu Zhang, Yu Wang, Guanghao Yin, Kejun Liu, Yuanyuan Liu, Tianshu Yu• 2023

Related benchmarks

TaskDatasetResultRank
Multimodal Sentiment AnalysisCMU-MOSI
MAE0.683
59
Multimodal Sentiment AnalysisMOSEI (test)
MAE0.526
49
Multimodal Sentiment AnalysisMOSI (test)
MAE0.683
34
Multimodal Sentiment AnalysisSIMS (test)
MAE0.5912
22
Multimodal Sentiment AnalysisCH-SIMS
F1 Score77.6
18
Multimodal Sentiment AnalysisMOSI
F1 Score85.1
12
Multimodal Sentiment AnalysisCH-SIMS (test)
Acc (2-class)81.19
8
Showing 7 of 7 rows

Other info

Follow for update