A simple neural network module for relational reasoning
About
Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn. In this paper we describe how to use Relation Networks (RNs) as a simple plug-and-play module to solve problems that fundamentally hinge on relational reasoning. We tested RN-augmented networks on three tasks: visual question answering using a challenging dataset called CLEVR, on which we achieve state-of-the-art, super-human performance; text-based question answering using the bAbI suite of tasks; and complex reasoning about dynamic physical systems. Then, using a curated dataset called Sort-of-CLEVR we show that powerful convolutional networks do not have a general capacity to solve relational questions, but can gain this capacity when augmented with RNs. Our work shows how a deep learning architecture equipped with an RN module can implicitly discover and learn to reason about entities and their relations.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Composed Image Retrieval | FashionIQ (val) | Shirt Recall@1018.33 | 455 | |
| 5-way Classification | miniImageNet (test) | Accuracy68.9 | 231 | |
| Visual Entailment | SNLI-VE (test) | Overall Accuracy67.55 | 197 | |
| Composed Image Retrieval | Fashion-IQ (test) | Dress Recall@100.1544 | 145 | |
| Question Answering | OpenBookQA (OBQA) (test) | OBQA Accuracy65.2 | 130 | |
| Commonsense Question Answering | CSQA (test) | Accuracy0.7008 | 127 | |
| Visual Entailment | SNLI-VE (val) | Overall Accuracy67.56 | 109 | |
| Few-shot classification | MiniImagenet | 5-way 5-shot Accuracy68.88 | 98 | |
| Visual Question Answering | CLEVR (test) | Overall Accuracy95.5 | 61 | |
| Question Answering | CommonsenseQA IH (test) | Accuracy69.08 | 57 |