Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

COGMEN: COntextualized GNN based Multimodal Emotion recognitioN

About

Emotions are an inherent part of human interactions, and consequently, it is imperative to develop AI systems that understand and recognize human emotions. During a conversation involving various people, a person's emotions are influenced by the other speaker's utterances and their own emotional state over the utterances. In this paper, we propose COntextualized Graph Neural Network based Multimodal Emotion recognitioN (COGMEN) system that leverages local information (i.e., inter/intra dependency between speakers) and global information (context). The proposed model uses Graph Neural Network (GNN) based architecture to model the complex dependencies (local and global information) in a conversation. Our model gives state-of-the-art (SOTA) results on IEMOCAP and MOSEI datasets, and detailed ablation experiments show the importance of modeling information at both levels.

Abhinav Joshi, Ashwani Bhat, Ayush Jain, Atin Vikram Singh, Ashutosh Modi• 2022

Related benchmarks

TaskDatasetResultRank
Multimodal Sentiment AnalysisCMU-MOSEI (test)--
206
Conversational Emotion RecognitionIEMOCAP
Weighted Average F1 Score67.6
129
Emotion RecognitionIEMOCAP
Accuracy68.2
71
Multimodal Emotion Recognition in ConversationMELD standard (test)
WF158.66
38
Multimodal Emotion Recognition in ConversationIEMOCAP 6-class (test)
Weighted F1 Score (WF1)67.6
33
Multimodal Emotion RecognitionIEMOCAP 6-way
F1 (Avg)67.63
28
Emotion RecognitionCMU-MOSEI (test)--
19
Multimodal Emotion RecognitionIEMOCAP 4-way
Happy Score78.8
14
Multimodal Emotion Recognition in ConversationIEMOCAP 4-class (test)
F1 Score (Weighted)84.5
8
Sentiment ClassificationMOSEI (test)
Accuracy (2 Class)85
7
Showing 10 of 14 rows

Other info

Code

Follow for update