Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
About
Accurate emotion perception is crucial for various applications, including human-computer interaction, education, and counseling. However, traditional single-modality approaches often fail to capture the complexity of real-world emotional expressions, which are inherently multimodal. Moreover, existing Multimodal Large Language Models (MLLMs) face challenges in integrating audio and recognizing subtle facial micro-expressions. To address this, we introduce the MERR dataset, containing 28,618 coarse-grained and 4,487 fine-grained annotated samples across diverse emotional categories. This dataset enables models to learn from varied scenarios and generalize to real-world applications. Furthermore, we propose Emotion-LLaMA, a model that seamlessly integrates audio, visual, and textual inputs through emotion-specific encoders. By aligning features into a shared space and employing a modified LLaMA model with instruction tuning, Emotion-LLaMA significantly enhances both emotional recognition and reasoning capabilities. Extensive evaluations show Emotion-LLaMA outperforms other MLLMs, achieving top scores in Clue Overlap (7.83) and Label Overlap (6.25) on EMER, an F1 score of 0.9036 on MER2023-SEMI challenge, and the highest UAR (45.59) and WAR (59.37) in zero-shot evaluations on DFEW dataset.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Dynamic Facial Expression Recognition | DFEW | UAR64.21 | 27 | |
| Emotion Cognition and Reasoning | HitEmotion ECR level 1.0 (test) | EER42.81 | 23 | |
| Emotion Understanding and Analysis | HitEmotion | DPTM (MF)39.54 | 23 | |
| Emotion Perception and Recognition | HitEmotion Level 1 | FESD33.11 | 23 | |
| Multimodal Emotion Reasoning | EMER | Clue Overlap7.83 | 18 | |
| Multimodal Emotion Recognition | MER 2023 | F1 Score90.36 | 16 | |
| Multimodal Emotion Recognition | DFEW | Hap93.05 | 15 | |
| Emotion and Micro-expression Analysis | PRISM | Macro-expression Accuracy73.5 | 13 | |
| Confidence Estimation | VECBench ID VER | ECE62.19 | 13 | |
| Confidence Estimation | VECBench ID VSA | ECE29.35 | 13 |