Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

About

We marry two powerful ideas: deep representation learning for visual recognition and language understanding, and symbolic program execution for reasoning. Our neural-symbolic visual question answering (NS-VQA) system first recovers a structural scene representation from the image and a program trace from the question. It then executes the program on the scene representation to obtain an answer. Incorporating symbolic structure as prior knowledge offers three unique advantages. First, executing programs on a symbolic space is more robust to long program traces; our model can solve complex reasoning tasks better, achieving an accuracy of 99.8% on the CLEVR dataset. Second, the model is more data- and memory-efficient: it performs well after learning on a small number of training data; it can also encode an image into a compact representation, requiring less storage than existing methods for offline question answering. Third, symbolic program execution offers full transparency to the reasoning process; we are thus able to interpret and diagnose each execution step.

Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, Joshua B. Tenenbaum• 2018

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringCLEVR (test)
Overall Accuracy99.8
61
Visual Question AnsweringCLEVR-Humans
Accuracy99.8
24
Visual Question AnsweringCLEVR-Humans 1.0 (test)
Accuracy57.8
22
Visual Question AnsweringCLEVR-CoGenT (Condition A)
Accuracy99.8
21
Visual Question AnsweringCLEVR-CoGenT Condition B
Accuracy63.9
18
Visual Question AnsweringCLEVR-Humans (test)
Accuracy67.8
17
Visual Question AnsweringCLEVR (val)
Overall Accuracy99.8
15
Visual Question AnsweringCLEVR-CoGenT (val)
Accuracy99.8
12
Visual Question AnsweringCLEVR Standard v1.0 (val)
Accuracy98.57
10
Compositional Image ReasoningCLOSURE
Accuracy77.2
5
Showing 10 of 13 rows

Other info

Code

Follow for update