Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning

About

Despite the success of various text generation metrics such as BERTScore, it is still difficult to evaluate the image captions without enough reference captions due to the diversity of the descriptions. In this paper, we introduce a new metric UMIC, an Unreferenced Metric for Image Captioning which does not require reference captions to evaluate image captions. Based on Vision-and-Language BERT, we train UMIC to discriminate negative captions via contrastive learning. Also, we observe critical problems of the previous benchmark dataset (i.e., human annotations) on image captioning metric, and introduce a new collection of human annotations on the generated captions. We validate UMIC on four datasets, including our new dataset, and show that UMIC has a higher correlation than all previous metrics that require multiple references. We release the benchmark dataset and pre-trained models to compute the UMIC.

Hwanhee Lee, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Kyomin Jung• 2021

Related benchmarks

TaskDatasetResultRank
Image Captioning EvaluationComposite
Kendall-c Tau_c56.1
92
Image Captioning EvaluationFlickr8K Expert (test)
Kendall tau_c46.8
76
Image Captioning EvaluationFlickr8k Expert
Kendall Tau-c (tau_c)46.8
73
Image Captioning EvaluationPascal-50S
Mean Score85.1
39
Correlation with human judgmentFlickr8K-CF
Tau B30.1
26
Correlation with Human JudgmentsComposite (test)
Kendall's Tau-c56.1
18
Correlation with Human JudgmentsFlickr8k Expert
Kendall's Tau (τc)46.8
17
Pairwise Ranking AccuracyPascal-50S 5-references (test)
HC66.1
16
Correlation with Human JudgmentsPolaris (test)
Kendall's Tau-c0.498
16
Showing 9 of 9 rows

Other info

Follow for update