Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation

About

Automatic image captioning evaluation is critical for benchmarking and promoting advances in image captioning research. Existing metrics only provide a single score to measure caption qualities, which are less explainable and informative. Instead, we humans can easily identify the problems of captions in details, e.g., which words are inaccurate and which salient objects are not described, and then rate the caption quality. To support such informative feedback, we propose an Informative Metric for Reference-free Image Caption evaluation (InfoMetIC). Given an image and a caption, InfoMetIC is able to report incorrect words and unmentioned image regions at fine-grained level, and also provide a text precision score, a vision recall score and an overall quality score at coarse-grained level. The coarse-grained score of InfoMetIC achieves significantly better correlation with human judgements than existing metrics on multiple benchmarks. We also construct a token-level evaluation dataset and demonstrate the effectiveness of InfoMetIC in fine-grained evaluation. Our code and datasets are publicly available at https://github.com/HAWLYQ/InfoMetIC.

Anwen Hu, Shizhe Chen, Liang Zhang, Qin Jin• 2023

Related benchmarks

TaskDatasetResultRank
Image Captioning EvaluationComposite
Kendall-c Tau_c59.3
92
Image Captioning EvaluationFlickr8K Expert (test)
Kendall tau_c55.5
76
Image Captioning EvaluationFlickr8k Expert
Kendall Tau-c (tau_c)55.5
73
Image Captioning EvaluationFlickr8K-CF (test)
Kendall tau_b36.6
65
Image Captioning EvaluationFlickr8K-CF
Kendall-b Correlation (tau_b)36.6
62
Image Captioning EvaluationPascal-50S
Mean Score86.5
39
Image-to-Text RetrievalNoCaps
R@190.9
17
Text-to-Image RetrievalNoCaps
Recall@176.2
17
Image Captioning EvaluationCOMPOSITE (COM) (test)
Kendall's tau-c59.3
17
Image Captioning EvaluationTHumB w/ Human 1.0
Precision21
16
Showing 10 of 11 rows

Other info

Code

Follow for update