Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Transformer-based Dual Relation Graph for Multi-label Image Recognition

About

The simultaneous recognition of multiple objects in one image remains a challenging task, spanning multiple events in the recognition field such as various object scales, inconsistent appearances, and confused inter-class relationships. Recent research efforts mainly resort to the statistic label co-occurrences and linguistic word embedding to enhance the unclear semantics. Different from these researches, in this paper, we propose a novel Transformer-based Dual Relation learning framework, constructing complementary relationships by exploring two aspects of correlation, i.e., structural relation graph and semantic relation graph. The structural relation graph aims to capture long-range correlations from object context, by developing a cross-scale transformer-based architecture. The semantic graph dynamically models the semantic meanings of image objects with explicit semantic-aware constraints. In addition, we also incorporate the learnt structural relationship into the semantic graph, constructing a joint relation graph for robust representations. With the collaborative learning of these two effective relation graphs, our approach achieves new state-of-the-art on two popular multi-label recognition benchmarks, i.e., MS-COCO and VOC 2007 dataset.

Jiawei Zhao, Ke Yan, Yifan Zhao, Xiaowei Guo, Feiyue Huang, Jia Li• 2021

Related benchmarks

TaskDatasetResultRank
Monocular Depth EstimationNYU v2 (test)
Abs Rel0.106
257
Monocular Depth EstimationKITTI
Abs Rel0.064
161
Monocular Depth EstimationKITTI Raw Eigen (test)
RMSE2.755
159
Monocular Depth EstimationKITTI 80m maximum depth (Eigen)
Abs Rel0.064
126
Monocular Depth EstimationNYU V2
Delta 1 Acc90
113
Multi-label image recognitionVOC 2007 (test)
mAP95
61
Multi-Label ClassificationMS-COCO (val)
mAP86
47
Multi-label recognitionMS-COCO (val)
F1 Score (All)80.4
18
Multi-label Image ClassificationNUS-WIDE
CF1 (Top 3)56.1
15
Multi-label Image ClassificationVG500
mAP37.7
11
Showing 10 of 11 rows

Other info

Code

Follow for update