EmoCaliber: Advancing Reliable Visual Emotion Comprehension via Confidence Verbalization and Calibration

About

Visual Emotion Comprehension (VEC) aims to infer sentiment polarities or emotion categories from affective cues embedded in images. In recent years, Multimodal Large Language Models (MLLMs) have established a popular paradigm in VEC, leveraging their generalizability to unify VEC tasks defined under diverse emotion taxonomies. While this paradigm achieves notable success, it typically formulates VEC as a deterministic task, requiring the model to output a single, definitive emotion label for each image. Such a formulation insufficiently accounts for the inherent subjectivity of emotion perception, overlooking alternative interpretations that may be equally plausible to different viewers. To address this limitation, we propose equipping MLLMs with capabilities to verbalize their confidence in emotion predictions. This additional signal provides users with an estimate of both the plausibility of alternative interpretations and the MLLMs' self-assessed competence, thereby enhancing reliability in practice. Building on this insight, we introduce a three-stage training framework that progressively endows with structured reasoning, teaches to verbalize confidence, and calibrates confidence expression, culminating in EmoCaliber, a confidence-aware MLLM for VEC. Through fair and comprehensive evaluations on the unified benchmark VECBench, EmoCaliber demonstrates overall superiority against existing methods in both emotion prediction and confidence estimation. These results validate the effectiveness of our approach and mark a feasible step toward more reliable VEC systems. Project page: https://github.com/wdqqdw/EmoCaliber.

Daiqing Wu, Dongbao Yang, Can Ma, Yu Zhou• 2025

Related benchmarks

Task	Dataset	Result
Emotion Perception	EEmo-Bench	Overall Perception Score60.67	50
Emotion Ranking	EEmo-Bench	Emotion Score61.84	25
Comprehensive Emotion Assessment	EEmo-Bench	Total Overall Score0.5185	25
Confidence Estimation	VECBench ID VER	ECE13.63	13
Confidence Estimation	VECBench OOD VER	ECE0.1217	13
Confidence Estimation	VECBench ID VSA	ECE4.76	13
Visual Emotion Recognition	VECBench In-Domain ID VER (test)	FI-8 Accuracy69.7	7
Emotion Prediction	FI-2 ID VSA, VECBench	Accuracy88.1	7
Emotion Prediction	WebEmo-2 ID VSA, VECBench	Accuracy75.8	7
Visual Emotion Recognition	VECBench VER (OOD)	UnbiasedEmo-6 Accuracy79.9	7

Showing 10 of 14 rows

Other info

GitHub

Follow for update

@wizwand_team Discord