Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

About

Accurate emotion perception is crucial for various applications, including human-computer interaction, education, and counseling. However, traditional single-modality approaches often fail to capture the complexity of real-world emotional expressions, which are inherently multimodal. Moreover, existing Multimodal Large Language Models (MLLMs) face challenges in integrating audio and recognizing subtle facial micro-expressions. To address this, we introduce the MERR dataset, containing 28,618 coarse-grained and 4,487 fine-grained annotated samples across diverse emotional categories. This dataset enables models to learn from varied scenarios and generalize to real-world applications. Furthermore, we propose Emotion-LLaMA, a model that seamlessly integrates audio, visual, and textual inputs through emotion-specific encoders. By aligning features into a shared space and employing a modified LLaMA model with instruction tuning, Emotion-LLaMA significantly enhances both emotional recognition and reasoning capabilities. Extensive evaluations show Emotion-LLaMA outperforms other MLLMs, achieving top scores in Clue Overlap (7.83) and Label Overlap (6.25) on EMER, an F1 score of 0.9036 on MER2023-SEMI challenge, and the highest UAR (45.59) and WAR (59.37) in zero-shot evaluations on DFEW dataset.

Zebang Cheng, Zhi-Qi Cheng, Jun-Yan He, Jingdong Sun, Kai Wang, Yuxiang Lin, Zheng Lian, Xiaojiang Peng, Alexander Hauptmann• 2024

Related benchmarks

Task	Dataset	Result
Multimodal Sentiment Analysis	CMU-MOSI (test)	--	390
Multimodal Sentiment Analysis	MOSEI	--	210
Emotion Recognition	IEMOCAP	--	151
Multimodal Sentiment Analysis	CH-SIMS (test)	F1 Score75.4	108
Sentiment Analysis	CMU-MOSEI (test)	--	96
Emotion Recognition	MELD (test)	Weighted F146.76	89
Dynamic Facial Expression Recognition	DFEW (5-fold cross-val)	UAR64.21	63
Emotion Recognition	MER-UniBench (test)	MER2359.38	62
Emotion Classification	IEMOCAP (test)	Weighted-F155.47	61
Dynamic Facial Expression Recognition	DFEW	WAR77.06	55

Showing 10 of 73 rows

...

Other info

Code

Follow for update

@wizwand_team Discord