Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Dynamic Memory Networks for Visual and Textual Question Answering

About

Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering. One such architecture, the dynamic memory network (DMN), obtained high accuracy on a variety of language tasks. However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during training or whether it could be applied to other modalities such as images. Based on an analysis of the DMN, we propose several improvements to its memory and input modules. Together with these changes we introduce a novel input module for images in order to be able to answer visual questions. Our new DMN+ model improves the state of the art on both the Visual Question Answering dataset and the \babi-10k text question-answering dataset without supporting fact supervision.

Caiming Xiong, Stephen Merity, Richard Socher• 2016

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringVQA (test-dev)
Acc (All)60.3
147
Visual Question AnsweringVQA (test-std)--
110
Open-Ended Visual Question AnsweringVQA 1.0 (test-dev)
Overall Accuracy60.3
100
Open-Ended Visual Question AnsweringVQA 1.0 (test-standard)
Overall Accuracy60.4
50
Visual Question AnswerVQA 1.0 (test-dev)
Overall Accuracy60.3
44
Open-Ended Visual Question AnsweringVQA (test-standard)
Accuracy (Overall)60.4
32
Question AnsweringbAbI 10k (test)
Task 1: 1 Supporting Fact Error0.00e+0
15
Visual Question AnsweringMemexQA (test)
Accuracy (How Many)79.2
9
Visual Question Answering (Open-Ended)VQA (test-dev)
Yes/No Accuracy80.5
8
Textual Question AnsweringbAbI English 10k (test)
Failed Tasks Count (Error > 5%)1
7
Showing 10 of 10 rows

Other info

Follow for update