Attention-Driven Dynamic Graph Convolutional Network for Multi-Label Image Recognition
About
Recent studies often exploit Graph Convolutional Network (GCN) to model label dependencies to improve recognition accuracy for multi-label image recognition. However, constructing a graph by counting the label co-occurrence possibilities of the training data may degrade model generalizability, especially when there exist occasional co-occurrence objects in test images. Our goal is to eliminate such bias and enhance the robustness of the learnt features. To this end, we propose an Attention-Driven Dynamic Graph Convolutional Network (ADD-GCN) to dynamically generate a specific graph for each image. ADD-GCN adopts a Dynamic Graph Convolutional Network (D-GCN) to model the relation of content-aware category representations that are generated by a Semantic Attention Module (SAM). Extensive experiments on public multi-label benchmarks demonstrate the effectiveness of our method, which achieves mAPs of 85.2%, 96.0%, and 95.5% on MS-COCO, VOC2007, and VOC2012, respectively, and outperforms current state-of-the-art methods with a clear margin. All codes can be found at https://github.com/Yejin0111/ADD-GCN.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-Label Classification | PASCAL VOC 2007 (test) | mAP96 | 125 | |
| Multi-Label Classification | MS-COCO 2014 (test) | mAP85.2 | 81 | |
| Multi-label Image Classification | VOC 2012 (test) | mAP95.5 | 72 | |
| Multi-label image recognition | VOC 2007 (test) | mAP96 | 61 | |
| Multi-Label Classification | MS-COCO (val) | mAP85.7 | 47 | |
| Multi-label recognition | PASCAL VOC 2007 (test) | Avg. mAP96.1 | 25 | |
| Multi-label image recognition | MS-COCO (val) | CP88.8 | 23 | |
| Multi-label recognition | MS-COCO (val) | F1 Score (All)80.1 | 18 | |
| Multi-label Image Classification | NUS-WIDE | CF1 (Top 3)56.5 | 15 | |
| Multi-label Image Classification | VG500 | mAP38.2 | 11 |