Learning to Compose Neural Networks for Question Answering
About
We describe a question answering model that applies to both images and structured knowledge bases. The model uses natural language strings to automatically assemble neural networks from a collection of composable modules. Parameters for these modules are learned jointly with network-assembly parameters via reinforcement learning, with only (world, question, answer) triples as supervision. Our approach, which we term a dynamic neural model network, achieves state-of-the-art results on benchmark datasets in both visual and structured domains.
Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Dan Klein• 2016
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Question Answering | VQA (test-dev) | Acc (All)59.4 | 147 | |
| Visual Question Answering | VQA (test-std) | -- | 110 | |
| Open-Ended Visual Question Answering | VQA 1.0 (test-dev) | Overall Accuracy59.4 | 100 | |
| Visual Question Answering (Multiple-choice) | VQA 1.0 (test-dev) | Accuracy (All)62.5 | 66 | |
| Open-Ended Visual Question Answering | VQA 1.0 (test-standard) | Overall Accuracy59.4 | 50 | |
| Visual Question Answer | VQA 1.0 (test-dev) | Overall Accuracy59.4 | 44 | |
| Open-Ended Visual Question Answering | VQA (test-standard) | Accuracy (Overall)59.4 | 32 | |
| Visual Question Answering | VQA 1 (test-standard) | VQA Open-Ended Accuracy (All)59.4 | 28 | |
| Visual Reasoning | NLVR v1 (Test-U) | Accuracy62 | 8 |
Showing 9 of 9 rows