Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Where To Look: Focus Regions for Visual Question Answering

About

We present a method that learns to answer visual questions by selecting image regions relevant to the text-based query. Our method exhibits significant improvements in answering questions such as "what color," where it is necessary to evaluate a specific location, and "what room," where it selectively identifies informative image regions. Our model is tested on the VQA dataset which is the largest human-annotated visual question answering dataset to our knowledge.

Kevin J. Shih, Saurabh Singh, Derek Hoiem• 2015

Related benchmarks

TaskDatasetResultRank
Visual Question Answering (Multiple-choice)VQA 1.0 (test-dev)
Accuracy (All)62.44
66
Visual Question Answering (Multiple-choice)VQA 1.0 (test-standard)
Accuracy (All)62.43
27
Visual Question Answering (Multiple-choice)VQA (test-dev)
Overall Accuracy62.4
17
Visual Question AnsweringVQA COCO 2015 v1.0 (test-dev)
Overall Accuracy60.96
16
Showing 4 of 4 rows

Other info

Follow for update