An Open and Comprehensive Pipeline for Unified Object Grounding and Detection

About

Grounding-DINO is a state-of-the-art open-set detection model that tackles multiple vision tasks including Open-Vocabulary Detection (OVD), Phrase Grounding (PG), and Referring Expression Comprehension (REC). Its effectiveness has led to its widespread adoption as a mainstream architecture for various downstream applications. However, despite its significance, the original Grounding-DINO model lacks comprehensive public technical details due to the unavailability of its training code. To bridge this gap, we present MM-Grounding-DINO, an open-source, comprehensive, and user-friendly baseline, which is built with the MMDetection toolbox. It adopts abundant vision datasets for pre-training and various detection and grounding datasets for fine-tuning. We give a comprehensive analysis of each reported result and detailed settings for reproduction. The extensive experiments on the benchmarks mentioned demonstrate that our MM-Grounding-DINO-Tiny outperforms the Grounding-DINO-Tiny baseline. We release all our models to the research community. Codes and trained models are released at https://github.com/open-mmlab/mmdetection/tree/main/configs/mm_grounding_dino.

Xiangyu Zhao, Yicheng Chen, Shilin Xu, Xiangtai Li, Xinjiang Wang, Yining Li, Haian Huang• 2024

Related benchmarks

Task	Dataset	Result
Object Detection	COCO 2017 (val)	AP50.4	2843
Object Detection	LVIS v1.0 (val)	APbbox31.9	542
Referring Expression Comprehension	RefCOCO+ (val)	Accuracy82.1	354
Referring Expression Comprehension	RefCOCO (val)	Accuracy89.5	348
Referring Expression Comprehension	RefCOCO (testA)	Accuracy0.914	346
Referring Expression Comprehension	RefCOCOg (test)	Accuracy85.8	300
Referring Expression Comprehension	RefCOCOg (val)	Accuracy85.5	300
Referring Expression Comprehension	RefCOCO+ (testB)	Accuracy74	244
Referring Expression Comprehension	RefCOCO+ (testA)	Accuracy87.5	216
Referring Expression Comprehension	RefCOCO (testB)	Accuracy86.6	213

Showing 10 of 42 rows

Other info

Code

Follow for update

@wizwand_team Discord