EEmo-Logic: A Unified Dataset and Multi-Stage Framework for Comprehensive Image-Evoked Emotion Assessment
About
Understanding the multi-dimensional attributes and intensity nuances of image-evoked emotions is pivotal for advancing machine empathy and empowering diverse human-computer interaction applications. However, existing models are still limited to coarse-grained emotion perception or deficient reasoning capabilities. To bridge this gap, we introduce EEmoDB, the largest image-evoked emotion understanding dataset to date. It features $5$ analysis dimensions spanning $5$ distinct task categories, facilitating comprehensive interpretation. Specifically, we compile $1.2M$ question-answering (QA) pairs (EEmoDB-QA) from $125k$ images via automated generation, alongside a $36k$ dataset (EEmoDB-Assess) curated from $25k$ images for fine-grained assessment. Furthermore, we propose EEmo-Logic, an all-in-one multimodal large language model (MLLM) developed via instruction fine-tuning and task-customized group relative preference optimization (GRPO) with novel reward design. Extensive experiments demonstrate that EEmo-Logic achieves robust performance in in-domain and cross-domain datasets, excelling in emotion QA and fine-grained assessment. The code is available at https://anonymous.4open.science/r/EEmoLogic.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Emotion Perception | EEmo-Bench | Overall Perception Score68.54 | 50 | |
| Comprehensive Emotion Assessment | EEmo-Bench | Total Overall Score0.6622 | 25 | |
| Emotion Ranking | EEmo-Bench | Emotion Score67.97 | 25 | |
| Emotion Assessment | EEmo-Bench 1.0 (test) | Valence SRCC0.85 | 10 | |
| Dominant evoked emotion recognition | ArtEmis | F1 Score31.58 | 6 | |
| Emotion-related aesthetic assessment | AesBench AesE | Emotion72.3 | 6 | |
| Dominant evoked emotion recognition | Artphoto | F1 Score38.68 | 6 | |
| Emotion-related aesthetic assessment | UNIAA Sent | Overall Score75.65 | 6 |