Hyperbolic Learning with Synthetic Captions for Open-World Detection

About

Open-world detection poses significant challenges, as it requires the detection of any object using either object class labels or free-form texts. Existing related works often use large-scale manual annotated caption datasets for training, which are extremely expensive to collect. Instead, we propose to transfer knowledge from vision-language models (VLMs) to enrich the open-vocabulary descriptions automatically. Specifically, we bootstrap dense synthetic captions using pre-trained VLMs to provide rich descriptions on different regions in images, and incorporate these captions to train a novel detector that generalizes to novel concepts. To mitigate the noise caused by hallucination in synthetic captions, we also propose a novel hyperbolic vision-language learning approach to impose a hierarchy between visual and caption embeddings. We call our detector ``HyperLearner''. We conduct extensive experiments on a wide variety of open-world detection benchmarks (COCO, LVIS, Object Detection in the Wild, RefCOCO) and our results show that our model consistently outperforms existing state-of-the-art methods, such as GLIP, GLIPv2 and Grounding DINO, when using the same backbone.

Fanjie Kong, Yanbei Chen, Jiarui Cai, Davide Modolo• 2024

Related benchmarks

Task	Dataset	Result
Object Detection	COCO 2017 (val)	AP57.4	2843
Referring Expression Comprehension	RefCOCO (testA)	--	346
Object Detection	LVIS (minival)	AP31.3	159
Referring Expression Comprehension	RefCOCO v1 (val)	Top-1 Accuracy90.74	49
Object Detection	ODinW (test)	mAP68.9	41
Referring Expression Comprehension	RefCOCO+ v1 (val)	Top-1 Accuracy82.35	13
Referring Expression Comprehension	RefCOCO v1 (testB)	Top-1 Accuracy85.46	13
Referring Expression Comprehension	RefCOCO+ v1 (testA)	Top-1 Accuracy84.7	13
Referring Expression Comprehension	RefCOCOg v1 (val)	Top-1 Acc82.53	13
Referring Expression Comprehension	RefCOCO+ v1 (testB)	Top-1 Acc72.64	12

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord