Image Co-localization by Mimicking a Good Detector's Confidence Score Distribution
About
Given a set of images containing objects from the same category, the task of image co-localization is to identify and localize each instance. This paper shows that this problem can be solved by a simple but intriguing idea, that is, a common object detector can be learnt by making its detection confidence scores distributed like those of a strongly supervised detector. More specifically, we observe that given a set of object proposals extracted from an image that contains the object of interest, an accurate strongly supervised object detector should give high scores to only a small minority of proposals, and low scores to most of them. Thus, we devise an entropy-based objective function to enforce the above property when learning the common object detector. Once the detector is learnt, we resort to a segmentation approach to refine the localization. We show that despite its simplicity, our approach outperforms state-of-the-art methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Co-localization | VOC 2007 | Aero Acc73.1 | 13 | |
| Unsupervised Object Discovery | VOC 2007 (train+val) | CorLoc40 | 13 | |
| Co-localization | PASCAL VOC 2007 (test) | CorLoc (aero)73.1 | 12 | |
| Co-localization | Object Discovery ImageNet-disjoint categories (test) | Chipmunk44.9 | 8 | |
| Single-object Colocalization | VOC all 2007 | CorLoc41.9 | 6 | |
| Co-localization | PASCAL VOC 2012 | Aero65.7 | 5 | |
| Colocalization | ImageNet 6 held-out classes (test) | Colocalization51.6 | 4 | |
| Single-object Colocalization | VOC 2012 | CorLoc45.6 | 4 | |
| Co-localization | PASCAL VOC 2012 (trainval) | Aero65.7 | 3 |