Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

VQA: Visual Question Answering

About

We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing ~0.25M images, ~0.76M questions, and ~10M answers (www.visualqa.org), and discuss the information it provides. Numerous baselines and methods for VQA are provided and compared with human performance. Our VQA demo is available on CloudCV (http://cloudcv.org/vqa).

Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Dhruv Batra, Devi Parikh• 2015

Related benchmarks

TaskDatasetResultRank
Video Question AnsweringNExT-QA (test)
Accuracy44.92
204
Video Question AnsweringNExT-QA (val)
Overall Acc44.24
176
Visual Question AnsweringVQA (test-dev)
Acc (All)58.97
147
Image CaptioningMS-COCO (test)
CIDEr91
117
Visual Question AnsweringVQA (test-std)--
110
Open-Ended Visual Question AnsweringVQA 1.0 (test-dev)
Overall Accuracy57.8
100
Audio-Visual Question AnsweringMUSIC-AVQA 1.0 (test)
AV Localis Accuracy71.43
96
Visual Question Answering (Multiple-choice)VQA 1.0 (test-dev)
Accuracy (All)62.7
66
Visual Question AnsweringCLEVR (test)
Overall Accuracy52.3
61
Audio-Visual Question AnsweringMUSIC-AVQA (test)
Acc (Avg)65.18
59
Showing 10 of 28 rows

Other info

Code

Follow for update