Zero-Shot Object Detection
About
We introduce and tackle the problem of zero-shot object detection (ZSD), which aims to detect object classes which are not observed during training. We work with a challenging set of object classes, not restricting ourselves to similar and/or fine-grained categories as in prior works on zero-shot classification. We present a principled approach by first adapting visual-semantic embeddings for ZSD. We then discuss the problems associated with selecting a background class and motivate two background-aware approaches for learning robust detectors. One of these models uses a fixed background class and the other is based on iterative latent assignments. We also outline the challenge associated with using a limited number of training classes and propose a solution based on dense sampling of the semantic label space using auxiliary data with a large number of categories. We propose novel splits of two standard detection datasets - MSCOCO and VisualGenome, and present extensive empirical results in both the traditional and generalized zero-shot settings to highlight the benefits of the proposed methods. We provide useful insights into the algorithm and conclude by posing some open questions to encourage further research.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Detection | COCO 2017 (val) | -- | 2454 | |
| Object Detection | OV-COCO | AP50 (Novel)30 | 97 | |
| Object Detection | COCO open-vocabulary (test) | -- | 25 | |
| Object Detection | MS-COCO 48/17 base/novel | GZSD All AP5024.9 | 21 | |
| Object Detection | MS-COCO (48/17) | Recall@100 (IoU=0.5)27.2 | 19 | |
| Zero-shot Object Detection | MS-COCO (48/17) | Recall@100 (IoU=0.5)27.2 | 16 | |
| Object Detection | MS-COCO Generalized (Novel) | mAP500.31 | 14 | |
| Object Detection | COCO novel and base categories 2014 | -- | 12 | |
| Object Detection | MSCOCO (48/17) | mAP (Base)0.292 | 11 | |
| Object Detection | COCO Open-vocabulary 2 (test) | mAP50 (Box, All)24.9 | 9 |