Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Comprehension-guided referring expressions

About

We consider generation and comprehension of natural language referring expression for objects in an image. Unlike generic "image captioning" which lacks natural standard evaluation criteria, quality of a referring expression may be measured by the receiver's ability to correctly infer which object is being described. Following this intuition, we propose two approaches to utilize models trained for comprehension task to generate better expressions. First, we use a comprehension module trained on human-generated expressions, as a "critic" of referring expression generator. The comprehension module serves as a differentiable proxy of human evaluation, providing training signal to the generation module. Second, we use the comprehension module in a generate-and-rerank pipeline, which chooses from candidate expressions generated by a model according to their performance on the comprehension task. We show that both approaches lead to improved referring expression generation on multiple benchmark datasets.

Ruotian Luo, Gregory Shakhnarovich• 2017

Related benchmarks

TaskDatasetResultRank
Referring Expression ComprehensionRefCOCO (testA)
Accuracy0.7404
333
Referring Expression ComprehensionRefCOCOg (val)
Accuracy65.36
291
Referring Expression ComprehensionRefCOCOg (test)
Accuracy60.3
291
Referring Expression ComprehensionRefCOCO+ (test-A)
Accuracy60.26
172
Referring Expression ComprehensionRefCOCO+ (test-B)
Accuracy55.03
167
Referring Expression ComprehensionRefCOCO (test-B)
Accuracy73.43
160
Referring Expression Object SegmentationRefCOCOg UMD (val)--
20
Phrase groundingReferIt
Accuracy31.85
14
Referring Expression Object SegmentationRefCOCO UMD (testA)
Accuracy (IoU > 0.5)67.94
11
Referring Expression Object SegmentationRefCOCO UMD (testB)
Accuracy (IoU > 0.5)55.18
11
Showing 10 of 10 rows

Other info

Follow for update