Learning Deep Features for Discriminative Localization
About
In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that can be applied to a variety of tasks. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014, which is remarkably close to the 34.2% top-5 error achieved by a fully supervised CNN approach. We demonstrate that our network is able to localize the discriminative image regions on a variety of tasks despite not being trained for them
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | Cityscapes (test) | mIoU32.2 | 1145 | |
| Semantic segmentation | CamVid (test) | mIoU6.6 | 411 | |
| Semantic segmentation | Cityscapes (val) | mIoU33 | 287 | |
| Image Classification | CUB-200-2011 (test) | -- | 276 | |
| Instance Segmentation | PASCAL VOC 2012 (val) | mAP @0.57.8 | 173 | |
| Visual Question Answering | VQA (test-dev) | Acc (All)58.91 | 147 | |
| Weakly Supervised Object Localization | CUB (test) | Top-1 Loc Acc56.1 | 80 | |
| Object Localization | ImageNet-1k (val) | Top-1 Loc Acc46.3 | 80 | |
| Semantic segmentation | PASCAL VOC 2012 (train) | mIoU58.1 | 73 | |
| Weakly Supervised Object Localization | CUB | MaxBoxAccV263.7 | 69 |