COGMEN: COntextualized GNN based Multimodal Emotion recognitioN

About

Emotions are an inherent part of human interactions, and consequently, it is imperative to develop AI systems that understand and recognize human emotions. During a conversation involving various people, a person's emotions are influenced by the other speaker's utterances and their own emotional state over the utterances. In this paper, we propose COntextualized Graph Neural Network based Multimodal Emotion recognitioN (COGMEN) system that leverages local information (i.e., inter/intra dependency between speakers) and global information (context). The proposed model uses Graph Neural Network (GNN) based architecture to model the complex dependencies (local and global information) in a conversation. Our model gives state-of-the-art (SOTA) results on IEMOCAP and MOSEI datasets, and detailed ablation experiments show the importance of modeling information at both levels.

Abhinav Joshi, Ashwani Bhat, Ayush Jain, Atin Vikram Singh, Ashutosh Modi• 2022

Related benchmarks

Task	Dataset	Result
Multimodal Sentiment Analysis	CMU-MOSEI (test)	--	401
Conversational Emotion Recognition	IEMOCAP	Weighted Average F1 Score67.6	174
Emotion Recognition	IEMOCAP	Accuracy68.2	151
Multimodal Emotion Recognition	IEMOCAP 6-way	F1 (Avg)67.63	106
Multimodal Emotion Recognition in Conversation	MELD standard (test)	WF158.66	53
Multimodal Emotion Recognition in Conversation	IEMOCAP 6-class (test)	Weighted F1 Score (WF1)67.6	44
Emotion Recognition	CMU-MOSEI (test)	--	19
Multimodal Emotion Recognition	IEMOCAP 4-way	Happy Score78.8	14
Multimodal Emotion Recognition in Conversation	IEMOCAP 4-class (test)	F1 Score (Weighted)84.5	8
Sentiment Classification	MOSEI (test)	Accuracy (2 Class)85	7

Showing 10 of 14 rows

Other info

Code

Follow for update

@wizwand_team Discord