Transformer-based Dual Relation Graph for Multi-label Image Recognition
About
The simultaneous recognition of multiple objects in one image remains a challenging task, spanning multiple events in the recognition field such as various object scales, inconsistent appearances, and confused inter-class relationships. Recent research efforts mainly resort to the statistic label co-occurrences and linguistic word embedding to enhance the unclear semantics. Different from these researches, in this paper, we propose a novel Transformer-based Dual Relation learning framework, constructing complementary relationships by exploring two aspects of correlation, i.e., structural relation graph and semantic relation graph. The structural relation graph aims to capture long-range correlations from object context, by developing a cross-scale transformer-based architecture. The semantic graph dynamically models the semantic meanings of image objects with explicit semantic-aware constraints. In addition, we also incorporate the learnt structural relationship into the semantic graph, constructing a joint relation graph for robust representations. With the collaborative learning of these two effective relation graphs, our approach achieves new state-of-the-art on two popular multi-label recognition benchmarks, i.e., MS-COCO and VOC 2007 dataset.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Monocular Depth Estimation | NYU v2 (test) | Abs Rel0.106 | 257 | |
| Monocular Depth Estimation | KITTI | Abs Rel0.064 | 161 | |
| Monocular Depth Estimation | KITTI Raw Eigen (test) | RMSE2.755 | 159 | |
| Monocular Depth Estimation | KITTI 80m maximum depth (Eigen) | Abs Rel0.064 | 126 | |
| Monocular Depth Estimation | NYU V2 | Delta 1 Acc90 | 113 | |
| Multi-label image recognition | VOC 2007 (test) | mAP95 | 61 | |
| Multi-Label Classification | MS-COCO (val) | mAP86 | 47 | |
| Multi-label recognition | MS-COCO (val) | F1 Score (All)80.4 | 18 | |
| Multi-label Image Classification | NUS-WIDE | CF1 (Top 3)56.1 | 15 | |
| Multi-label Image Classification | VG500 | mAP37.7 | 11 |