Pixels to Graphs by Associative Embedding
About
Graphs are a useful abstraction of image content. Not only can graphs represent details about individual objects in a scene but they can capture the interactions between pairs of objects. We present a method for training a convolutional neural network such that it takes in an input image and produces a full graph definition. This is done end-to-end in a single stage with the use of associative embeddings. The network learns to simultaneously identify all of the elements that make up a graph and piece them together. We benchmark on the Visual Genome dataset, and demonstrate state-of-the-art performance on the challenging task of scene graph generation.
Alejandro Newell, Jia Deng• 2017
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Scene Graph Generation | Visual Genome (test) | R@509.7 | 86 | |
| Scene Graph Classification | Visual Genome (test) | Recall@10030 | 63 | |
| Predicate Classification | Visual Genome | Recall@5068 | 54 | |
| Predicate Classification | Visual Genome (test) | R@5068 | 50 | |
| Scene Graph Classification | Visual Genome | R@5026.5 | 45 | |
| Scene Graph Detection | Visual Genome | Recall@10011.3 | 31 | |
| Predicate Classification | Visual Genome 1.0 (test) | R@10075.2 | 22 | |
| Relation Prediction | VG200 | R@5054.1 | 13 | |
| Scene Graph Generation | Visual Genome 1.0 (test) | Recall@509.7 | 11 | |
| Predicate Classification | Visual Genome v1.2 (test) | Recall@5054.1 | 11 |
Showing 10 of 13 rows