Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

About

Accurate emotion perception is crucial for various applications, including human-computer interaction, education, and counseling. However, traditional single-modality approaches often fail to capture the complexity of real-world emotional expressions, which are inherently multimodal. Moreover, existing Multimodal Large Language Models (MLLMs) face challenges in integrating audio and recognizing subtle facial micro-expressions. To address this, we introduce the MERR dataset, containing 28,618 coarse-grained and 4,487 fine-grained annotated samples across diverse emotional categories. This dataset enables models to learn from varied scenarios and generalize to real-world applications. Furthermore, we propose Emotion-LLaMA, a model that seamlessly integrates audio, visual, and textual inputs through emotion-specific encoders. By aligning features into a shared space and employing a modified LLaMA model with instruction tuning, Emotion-LLaMA significantly enhances both emotional recognition and reasoning capabilities. Extensive evaluations show Emotion-LLaMA outperforms other MLLMs, achieving top scores in Clue Overlap (7.83) and Label Overlap (6.25) on EMER, an F1 score of 0.9036 on MER2023-SEMI challenge, and the highest UAR (45.59) and WAR (59.37) in zero-shot evaluations on DFEW dataset.

Zebang Cheng, Zhi-Qi Cheng, Jun-Yan He, Jingdong Sun, Kai Wang, Yuxiang Lin, Zheng Lian, Xiaojiang Peng, Alexander Hauptmann• 2024

Related benchmarks

TaskDatasetResultRank
Dynamic Facial Expression RecognitionDFEW
UAR64.21
27
Emotion Cognition and ReasoningHitEmotion ECR level 1.0 (test)
EER42.81
23
Emotion Understanding and AnalysisHitEmotion
DPTM (MF)39.54
23
Emotion Perception and RecognitionHitEmotion Level 1
FESD33.11
23
Multimodal Emotion ReasoningEMER
Clue Overlap7.83
18
Multimodal Emotion RecognitionMER 2023
F1 Score90.36
16
Multimodal Emotion RecognitionDFEW
Hap93.05
15
Emotion and Micro-expression AnalysisPRISM
Macro-expression Accuracy73.5
13
Confidence EstimationVECBench ID VER
ECE62.19
13
Confidence EstimationVECBench ID VSA
ECE29.35
13
Showing 10 of 25 rows

Other info

Code

Follow for update