COGMEN: COntextualized GNN based Multimodal Emotion recognitioN
About
Emotions are an inherent part of human interactions, and consequently, it is imperative to develop AI systems that understand and recognize human emotions. During a conversation involving various people, a person's emotions are influenced by the other speaker's utterances and their own emotional state over the utterances. In this paper, we propose COntextualized Graph Neural Network based Multimodal Emotion recognitioN (COGMEN) system that leverages local information (i.e., inter/intra dependency between speakers) and global information (context). The proposed model uses Graph Neural Network (GNN) based architecture to model the complex dependencies (local and global information) in a conversation. Our model gives state-of-the-art (SOTA) results on IEMOCAP and MOSEI datasets, and detailed ablation experiments show the importance of modeling information at both levels.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multimodal Sentiment Analysis | CMU-MOSEI (test) | -- | 206 | |
| Conversational Emotion Recognition | IEMOCAP | Weighted Average F1 Score67.6 | 129 | |
| Emotion Recognition | IEMOCAP | Accuracy68.2 | 71 | |
| Multimodal Emotion Recognition in Conversation | MELD standard (test) | WF158.66 | 38 | |
| Multimodal Emotion Recognition in Conversation | IEMOCAP 6-class (test) | Weighted F1 Score (WF1)67.6 | 33 | |
| Multimodal Emotion Recognition | IEMOCAP 6-way | F1 (Avg)67.63 | 28 | |
| Emotion Recognition | CMU-MOSEI (test) | -- | 19 | |
| Multimodal Emotion Recognition | IEMOCAP 4-way | Happy Score78.8 | 14 | |
| Multimodal Emotion Recognition in Conversation | IEMOCAP 4-class (test) | F1 Score (Weighted)84.5 | 8 | |
| Sentiment Classification | MOSEI (test) | Accuracy (2 Class)85 | 7 |