Multimodal Multi-loss Fusion Network for Sentiment Analysis
About
This paper investigates the optimal selection and fusion of feature encoders across multiple modalities and combines these in one neural network to improve sentiment detection. We compare different fusion methods and examine the impact of multi-loss training within the multi-modality fusion network, identifying surprisingly important findings relating to subnet performance. We have also found that integrating context significantly enhances model performance. Our best model achieves state-of-the-art performance for three datasets (CMU-MOSI, CMU-MOSEI and CH-SIMS). These results suggest a roadmap toward an optimized feature selection and fusion approach for enhancing sentiment detection in neural networks.
Zehui Wu, Ziwei Gong, Jaywon Koo, Julia Hirschberg• 2023
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multimodal Sentiment Analysis | CMU-MOSI | -- | 144 | |
| Multimodal Sentiment Analysis | CH-SIMS (test) | F1 Score80.98 | 108 | |
| Emotion Recognition | EAV | Accuracy59.6 | 16 | |
| Sleep Staging | ISRUC S3 (leave-one-subject-out cross-validation) | Accuracy73.33 | 9 | |
| Cognitive Task Assessment | Cognitive N-back Task | Accuracy45.32 | 9 | |
| Word Generation | Cognitive dataset (cross-subject 10-fold cross-validation) | Accuracy (%)56.35 | 9 |
Showing 6 of 6 rows