Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PromptDet: Towards Open-vocabulary Detection using Uncurated Images

About

The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations. To achieve that, we make the following four contributions: (i) in pursuit of generalisation, we propose a two-stage open-vocabulary object detector, where the class-agnostic object proposals are classified with a text encoder from pre-trained visual-language model; (ii) To pair the visual latent space (of RPN box proposals) with that of the pre-trained text encoder, we propose the idea of regional prompt learning to align the textual embedding space with regional visual object features; (iii) To scale up the learning procedure towards detecting a wider spectrum of objects, we exploit the available online resource via a novel self-training framework, which allows to train the proposed detector on a large corpus of noisy uncurated web images. Lastly, (iv) to evaluate our proposed detector, termed as PromptDet, we conduct extensive experiments on the challenging LVIS and MS-COCO dataset. PromptDet shows superior performance over existing approaches with fewer additional training images and zero manual annotations whatsoever. Project page with code: https://fcjian.github.io/promptdet.

Chengjian Feng, Yujie Zhong, Zequn Jie, Xiangxiang Chu, Haibing Ren, Xiaolin Wei, Weidi Xie, Lin Ma• 2022

Related benchmarks

TaskDatasetResultRank
Object DetectionCOCO 2017 (val)--
2454
Object DetectionLVIS v1.0 (val)
APbbox25.3
518
Instance SegmentationLVIS v1.0 (val)--
189
Object DetectionOV-COCO
AP50 (Novel)26.6
97
Instance SegmentationLVIS
mAP (Mask)25.3
68
Open-vocabulary object detectionLVIS v1 (val)
AP_r^b21.4
54
Instance SegmentationLVIS (val)
APr21.4
46
Object DetectionCOCO open-vocabulary (test)
Novel AP26.6
25
Open-vocabulary object detectionOV-LVIS
AP Novel19
18
Object DetectionOV-LVIS v1 (val)
AP_mask_novel19
17
Showing 10 of 21 rows

Other info

Follow for update