Localizing Objects with Self-Supervised Transformers and no Labels
About
Localizing objects in image collections without supervision can help to avoid expensive annotation campaigns. We propose a simple approach to this problem, that leverages the activation features of a vision transformer pre-trained in a self-supervised manner. Our method, LOST, does not require any external object proposal nor any exploration of the image collection; it operates on a single image. Yet, we outperform state-of-the-art object discovery methods by up to 8 CorLoc points on PASCAL VOC 2012. We also show that training a class-agnostic detector on the discovered objects boosts results by another 7 points. Moreover, we show promising results on the unsupervised object discovery task. The code to reproduce our results can be found at https://github.com/valeoai/LOST.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Detection | PASCAL VOC 2007 (test) | mAP29.9 | 821 | |
| Object Detection | VOC 2007 (test) | -- | 52 | |
| Object Localization | PASCAL VOC 2012 (trainval) | CorLoc64 | 51 | |
| Salient Object Detection | ECSSD 1,000 images (test) | -- | 48 | |
| Saliency Detection | DUT-OMRON 29 (test) | IoU48.9 | 38 | |
| Unsupervised single object discovery | VOC 2007 (test) | CorLoc65.7 | 34 | |
| Unsupervised single object discovery | VOC 2012 (test) | CorLoc70.4 | 34 | |
| Unsupervised single object discovery | COCO20K 2014 (train) | CorLoc57.5 | 33 | |
| Single-object discovery | PASCAL VOC 2007 (trainval) | CorLoc65.7 | 26 | |
| Saliency Detection | DUTS (test) | IoU57.2 | 22 |