GPS-Net: Graph Property Sensing Network for Scene Graph Generation
About
Scene graph generation (SGG) aims to detect objects in an image along with their pairwise relationships. There are three key properties of scene graph that have been underexplored in recent works: namely, the edge direction information, the difference in priority between nodes, and the long-tailed distribution of relationships. Accordingly, in this paper, we propose a Graph Property Sensing Network (GPS-Net) that fully explores these three properties for SGG. First, we propose a novel message passing module that augments the node feature with node-specific contextual information and encodes the edge direction information via a tri-linear model. Second, we introduce a node priority sensitive loss to reflect the difference in priority between nodes during training. This is achieved by designing a mapping function that adjusts the focusing parameter in the focal loss. Third, since the frequency of relationships is affected by the long-tailed distribution problem, we mitigate this issue by first softening the distribution and then enabling it to be adjusted for each subject-object pair according to their visual appearance. Systematic experiments demonstrate the effectiveness of the proposed techniques. Moreover, GPS-Net achieves state-of-the-art performance on three popular databases: VG, OI, and VRD by significant gains under various settings and metrics. The code and models are available at \url{https://github.com/taksau/GPS-Net}.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Scene Graph Generation | Visual Genome (test) | R@500.359 | 86 | |
| Relation Detection | VRD (test) | R@5063.4 | 75 | |
| Scene Graph Generation | Open Images v6 (test) | wmAPrel32.9 | 74 | |
| Scene Graph Classification | VG150 (test) | mR@5011.8 | 66 | |
| Scene Graph Classification | Visual Genome (test) | Recall@10012.6 | 63 | |
| PredCLS | Action Genome (test) | Recall@1076 | 54 | |
| Predicate Classification | Visual Genome | Recall@5065.2 | 54 | |
| Predicate Classification | Visual Genome (VG) 150 object categories, 50 relationship categories (test) | mR@10022.8 | 44 | |
| Scene Graph Detection | VG150 (test) | ng-mR@508.7 | 41 | |
| Scene Graph Classification | Action Genome (test) | Recall@1045.3 | 40 |