Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Hyperbolic Learning with Synthetic Captions for Open-World Detection

About

Open-world detection poses significant challenges, as it requires the detection of any object using either object class labels or free-form texts. Existing related works often use large-scale manual annotated caption datasets for training, which are extremely expensive to collect. Instead, we propose to transfer knowledge from vision-language models (VLMs) to enrich the open-vocabulary descriptions automatically. Specifically, we bootstrap dense synthetic captions using pre-trained VLMs to provide rich descriptions on different regions in images, and incorporate these captions to train a novel detector that generalizes to novel concepts. To mitigate the noise caused by hallucination in synthetic captions, we also propose a novel hyperbolic vision-language learning approach to impose a hierarchy between visual and caption embeddings. We call our detector ``HyperLearner''. We conduct extensive experiments on a wide variety of open-world detection benchmarks (COCO, LVIS, Object Detection in the Wild, RefCOCO) and our results show that our model consistently outperforms existing state-of-the-art methods, such as GLIP, GLIPv2 and Grounding DINO, when using the same backbone.

Fanjie Kong, Yanbei Chen, Jiarui Cai, Davide Modolo• 2024

Related benchmarks

TaskDatasetResultRank
Object DetectionCOCO 2017 (val)
AP57.4
2454
Referring Expression ComprehensionRefCOCO (testA)--
333
Object DetectionLVIS (minival)
AP31.3
127
Referring Expression ComprehensionRefCOCO v1 (val)
Top-1 Accuracy90.74
49
Object DetectionODinW (test)
mAP68.9
41
Referring Expression ComprehensionRefCOCO+ v1 (val)
Top-1 Accuracy82.35
13
Referring Expression ComprehensionRefCOCO v1 (testB)
Top-1 Accuracy85.46
13
Referring Expression ComprehensionRefCOCO+ v1 (testA)
Top-1 Accuracy84.7
13
Referring Expression ComprehensionRefCOCOg v1 (val)
Top-1 Acc82.53
13
Referring Expression ComprehensionRefCOCO+ v1 (testB)
Top-1 Acc72.64
12
Showing 10 of 11 rows

Other info

Follow for update