Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

A Corpus for Reasoning About Natural Language Grounded in Photographs

About

We introduce a new dataset for joint reasoning about natural language and images, with a focus on semantic diversity, compositionality, and visual reasoning challenges. The data contains 107,292 examples of English sentences paired with web photographs. The task is to determine whether a natural language caption is true about a pair of photographs. We crowdsource the data using sets of visually rich images and a compare-and-contrast task to elicit linguistically diverse language. Qualitative analysis shows the data requires compositional joint reasoning, including about quantities, comparisons, and relations. Evaluation using state-of-the-art visual reasoning methods shows the data presents a strong challenge.

Alane Suhr, Stephanie Zhou, Ally Zhang, Iris Zhang, Huajun Bai, Yoav Artzi• 2018

Related benchmarks

TaskDatasetResultRank
Natural Language Visual ReasoningNLVR2 (test-p)
Accuracy54.8
327
Natural Language Visual ReasoningNLVR2 (dev)
Accuracy54.8
288
Natural Language Visual ReasoningNLVR2 (test-u)
Accuracy53.5
2
Showing 3 of 3 rows

Other info

Follow for update