Learning to Compose Neural Networks for Question Answering

About

We describe a question answering model that applies to both images and structured knowledge bases. The model uses natural language strings to automatically assemble neural networks from a collection of composable modules. Parameters for these modules are learned jointly with network-assembly parameters via reinforcement learning, with only (world, question, answer) triples as supervision. Our approach, which we term a dynamic neural model network, achieves state-of-the-art results on benchmark datasets in both visual and structured domains.

Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Dan Klein• 2016

Related benchmarks

Task	Dataset	Result
Visual Question Answering	VQA (test-dev)	Acc (All)59.4	147
Visual Question Answering	VQA (test-std)	--	120
Open-Ended Visual Question Answering	VQA 1.0 (test-dev)	Overall Accuracy59.4	100
Visual Question Answering (Multiple-choice)	VQA 1.0 (test-dev)	Accuracy (All)62.5	66
Open-Ended Visual Question Answering	VQA 1.0 (test-standard)	Overall Accuracy59.4	50
Visual Question Answer	VQA 1.0 (test-dev)	Overall Accuracy59.4	44
Open-Ended Visual Question Answering	VQA (test-standard)	Accuracy (Overall)59.4	32
Visual Question Answering	VQA 1 (test-standard)	VQA Open-Ended Accuracy (All)59.4	28
Visual Reasoning	NLVR v1 (Test-U)	Accuracy62	8

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord