Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations

About

Recent advances in large language models and vision-language models have led to growing interest in explainable evaluation metrics for image captioning. However, these metrics generate explanations without standardized criteria, and the overall quality of the generated explanations remains unverified. In this paper, we propose EXPERT, a reference-free evaluation metric that provides structured explanations based on three fundamental criteria: fluency, relevance, and descriptiveness. By constructing large-scale datasets of high-quality structured explanations, we develop a two-stage evaluation template to effectively supervise a vision-language model for both scoring and explanation generation. EXPERT achieves state-of-the-art results on benchmark datasets while providing significantly higher-quality explanations than existing metrics, as validated through comprehensive human evaluation. Our code and datasets are available at https://github.com/hjkim811/EXPERT.

Hyunjong Kim, Sangyeop Kim, Jongheon Jeong, Yeongjae Cho, Sungzoon Cho• 2025

Related benchmarks

TaskDatasetResultRank
Image Captioning EvaluationComposite
Kendall-c Tau_c65
92
Image Captioning EvaluationFlickr8k Expert
Kendall Tau-c (tau_c)56.7
73
Image Captioning EvaluationNebula
Kendall tau_c54.9
22
Showing 3 of 3 rows

Other info

Code

Follow for update