Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

About

We tackle image question answering (ImageQA) problem by learning a convolutional neural network (CNN) with a dynamic parameter layer whose weights are determined adaptively based on questions. For the adaptive parameter prediction, we employ a separate parameter prediction network, which consists of gated recurrent unit (GRU) taking a question as its input and a fully-connected layer generating a set of candidate weights as its output. However, it is challenging to construct a parameter prediction network for a large number of parameters in the fully-connected dynamic parameter layer of the CNN. We reduce the complexity of this problem by incorporating a hashing technique, where the candidate weights given by the parameter prediction network are selected using a predefined hash function to determine individual weights in the dynamic parameter layer. The proposed network---joint network with the CNN for ImageQA and the parameter prediction network---is trained end-to-end through back-propagation, where its weights are initialized using a pre-trained CNN and GRU. The proposed algorithm illustrates the state-of-the-art performance on all available public ImageQA benchmarks.

Hyeonwoo Noh, Paul Hongsuck Seo, Bohyung Han• 2015

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringVQA (test-dev)
Acc (All)62.48
147
Visual Question AnsweringVQA (test-std)--
110
Open-Ended Visual Question AnsweringVQA 1.0 (test-dev)
Overall Accuracy57.22
100
Visual Question Answering (Multiple-choice)VQA 1.0 (test-dev)
Accuracy (All)62.5
66
Visual Question AnsweringCOCO-QA (test)
WUPS (IoU=0.9)70.84
51
Open-Ended Visual Question AnsweringVQA 1.0 (test-standard)
Overall Accuracy57.4
50
Image Question AnsweringDAQUAR REDUCED (test)
Accuracy44.48
33
Image RetrievalFashion200k (test)
Recall@112.2
32
Open-Ended Visual Question AnsweringVQA (test-standard)
Accuracy (Overall)57.4
32
Visual Question AnsweringVQA 1 (test-standard)
VQA Open-Ended Accuracy (All)57.36
28
Showing 10 of 20 rows

Other info

Code

Follow for update