Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection

About

In pursuit of detecting unstinted objects that extend beyond predefined categories, prior arts of open-vocabulary object detection (OVD) typically resort to pretrained vision-language models (VLMs) for base-to-novel category generalization. However, to mitigate the misalignment between upstream image-text pretraining and downstream region-level perception, additional supervisions are indispensable, eg, image-text pairs or pseudo annotations generated via self-training strategies. In this work, we propose CCKT-Det trained without any extra supervision. The proposed framework constructs a cyclic and dynamic knowledge transfer from language queries and visual region features extracted from VLMs, which forces the detector to closely align with the visual-semantic space of VLMs. Specifically, 1) we prefilter and inject semantic priors to guide the learning of queries, and 2) introduce a regional contrastive loss to improve the awareness of queries on novel objects. CCKT-Det can consistently improve performance as the scale of VLMs increases, all while requiring the detector at a moderate level of computation overhead. Comprehensive experimental results demonstrate that our method achieves performance gain of +2.9% and +10.2% AP50 over previous state-of-the-arts on the challenging COCO benchmark, both without and with a stronger teacher model.

Chuhan Zhang, Chaoyang Zhu, Pingcheng Dong, Long Chen, Dong Zhang• 2025

Related benchmarks

TaskDatasetResultRank
Object DetectionCOCO
AP50 (Box)53.2
190
Object DetectionLVIS
APr18.2
59
Object DetectionObject365
AP13.4
17
Object DetectionCOCO Novel Base All 2017
AP Novel46
9
Showing 4 of 4 rows

Other info

Follow for update