Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

About

We address a question answering task on real-world images that is set up as a Visual Turing Test. By combining latest advances in image representation and natural language processing, we propose Neural-Image-QA, an end-to-end formulation to this problem for which all parts are trained jointly. In contrast to previous efforts, we are facing a multi-modal problem where the language output (answer) is conditioned on visual and natural language input (image and question). Our approach Neural-Image-QA doubles the performance of the previous best approach on this problem. We provide additional insights into the problem by analyzing how much information is contained only in the language part for which we provide a new human baseline. To study human consensus, which is related to the ambiguities inherent in this challenging task, we propose two novel metrics and collect additional answers which extends the original DAQUAR dataset to DAQUAR-Consensus.

Mateusz Malinowski, Marcus Rohrbach, Mario Fritz• 2015

Related benchmarks

TaskDatasetResultRank
Image Question AnsweringDAQUAR REDUCED (test)
Accuracy60.27
33
Visual Question AnsweringDAQUAR-ALL full (test)
Accuracy50.2
22
Visual Question AnsweringDAQUAR single-word answers portion
Accuracy34.68
11
Visual Question AnsweringDAQUAR (reduced)
Accuracy32.32
8
Visual Question AnsweringDAQUAR reduced Single answer
Accuracy34.68
6
Visual Question AnsweringDAQUAR all Multiple answers
Accuracy17.49
5
Visual Question AnsweringDAQUAR reduced Multiple answers
Accuracy20.27
4
Visual Question AnsweringDAQUAR all Single answer
Acc19.43
3
Showing 8 of 8 rows

Other info

Follow for update