Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

About

Accurate emotion perception is crucial for various applications, including human-computer interaction, education, and counseling. However, traditional single-modality approaches often fail to capture the complexity of real-world emotional expressions, which are inherently multimodal. Moreover, existing Multimodal Large Language Models (MLLMs) face challenges in integrating audio and recognizing subtle facial micro-expressions. To address this, we introduce the MERR dataset, containing 28,618 coarse-grained and 4,487 fine-grained annotated samples across diverse emotional categories. This dataset enables models to learn from varied scenarios and generalize to real-world applications. Furthermore, we propose Emotion-LLaMA, a model that seamlessly integrates audio, visual, and textual inputs through emotion-specific encoders. By aligning features into a shared space and employing a modified LLaMA model with instruction tuning, Emotion-LLaMA significantly enhances both emotional recognition and reasoning capabilities. Extensive evaluations show Emotion-LLaMA outperforms other MLLMs, achieving top scores in Clue Overlap (7.83) and Label Overlap (6.25) on EMER, an F1 score of 0.9036 on MER2023-SEMI challenge, and the highest UAR (45.59) and WAR (59.37) in zero-shot evaluations on DFEW dataset.

Zebang Cheng, Zhi-Qi Cheng, Jun-Yan He, Jingdong Sun, Kai Wang, Yuxiang Lin, Zheng Lian, Xiaojiang Peng, Alexander Hauptmann• 2024

Related benchmarks

TaskDatasetResultRank
Multimodal Sentiment AnalysisCMU-MOSI (test)--
385
Multimodal Sentiment AnalysisMOSEI--
183
Emotion RecognitionIEMOCAP--
151
Multimodal Sentiment AnalysisCH-SIMS (test)
F1 Score75.4
108
Sentiment AnalysisCMU-MOSEI (test)--
96
Emotion RecognitionMELD (test)
Weighted F146.76
89
Emotion ClassificationIEMOCAP (test)
Weighted-F155.47
61
Dynamic Facial Expression RecognitionDFEW
WAR77.06
47
Multimodal Emotion Recognition in ConversationMELD
Weighted Avg F1 Score46.76
36
Emotion RecognitionMER-UniBench (test)
MER2359.38
35
Showing 10 of 51 rows

Other info

Code

Follow for update