Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

An Open and Comprehensive Pipeline for Unified Object Grounding and Detection

About

Grounding-DINO is a state-of-the-art open-set detection model that tackles multiple vision tasks including Open-Vocabulary Detection (OVD), Phrase Grounding (PG), and Referring Expression Comprehension (REC). Its effectiveness has led to its widespread adoption as a mainstream architecture for various downstream applications. However, despite its significance, the original Grounding-DINO model lacks comprehensive public technical details due to the unavailability of its training code. To bridge this gap, we present MM-Grounding-DINO, an open-source, comprehensive, and user-friendly baseline, which is built with the MMDetection toolbox. It adopts abundant vision datasets for pre-training and various detection and grounding datasets for fine-tuning. We give a comprehensive analysis of each reported result and detailed settings for reproduction. The extensive experiments on the benchmarks mentioned demonstrate that our MM-Grounding-DINO-Tiny outperforms the Grounding-DINO-Tiny baseline. We release all our models to the research community. Codes and trained models are released at https://github.com/open-mmlab/mmdetection/tree/main/configs/mm_grounding_dino.

Xiangyu Zhao, Yicheng Chen, Shilin Xu, Xiangtai Li, Xinjiang Wang, Yining Li, Haian Huang• 2024

Related benchmarks

TaskDatasetResultRank
Object DetectionCOCO 2017 (val)
AP50.4
2643
Object DetectionLVIS v1.0 (val)
APbbox31.9
529
Referring Expression ComprehensionRefCOCO+ (val)
Accuracy82.1
354
Referring Expression ComprehensionRefCOCO (val)
Accuracy89.5
344
Referring Expression ComprehensionRefCOCO (testA)
Accuracy0.914
342
Referring Expression ComprehensionRefCOCOg (test)
Accuracy85.8
300
Referring Expression ComprehensionRefCOCOg (val)
Accuracy85.5
300
Referring Expression ComprehensionRefCOCO+ (testB)
Accuracy74
244
Referring Expression ComprehensionRefCOCO+ (testA)
Accuracy87.5
216
Referring Expression ComprehensionRefCOCO (testB)
Accuracy86.6
205
Showing 10 of 36 rows

Other info

Code

Follow for update