Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

An Open and Comprehensive Pipeline for Unified Object Grounding and Detection

About

Grounding-DINO is a state-of-the-art open-set detection model that tackles multiple vision tasks including Open-Vocabulary Detection (OVD), Phrase Grounding (PG), and Referring Expression Comprehension (REC). Its effectiveness has led to its widespread adoption as a mainstream architecture for various downstream applications. However, despite its significance, the original Grounding-DINO model lacks comprehensive public technical details due to the unavailability of its training code. To bridge this gap, we present MM-Grounding-DINO, an open-source, comprehensive, and user-friendly baseline, which is built with the MMDetection toolbox. It adopts abundant vision datasets for pre-training and various detection and grounding datasets for fine-tuning. We give a comprehensive analysis of each reported result and detailed settings for reproduction. The extensive experiments on the benchmarks mentioned demonstrate that our MM-Grounding-DINO-Tiny outperforms the Grounding-DINO-Tiny baseline. We release all our models to the research community. Codes and trained models are released at https://github.com/open-mmlab/mmdetection/tree/main/configs/mm_grounding_dino.

Xiangyu Zhao, Yicheng Chen, Shilin Xu, Xiangtai Li, Xinjiang Wang, Yining Li, Haian Huang• 2024

Related benchmarks

TaskDatasetResultRank
Object DetectionCOCO 2017 (val)
AP50.4
2454
Object DetectionLVIS v1.0 (val)
APbbox31.9
518
Referring Expression ComprehensionRefCOCO+ (val)
Accuracy82.1
345
Referring Expression ComprehensionRefCOCO (val)
Accuracy89.5
335
Referring Expression ComprehensionRefCOCO (testA)
Accuracy0.914
333
Referring Expression ComprehensionRefCOCOg (test)
Accuracy85.8
291
Referring Expression ComprehensionRefCOCOg (val)
Accuracy85.5
291
Referring Expression ComprehensionRefCOCO+ (testB)
Accuracy74
235
Referring Expression ComprehensionRefCOCO+ (testA)
Accuracy87.5
207
Referring Expression ComprehensionRefCOCO (testB)
Accuracy86.6
196
Showing 10 of 32 rows

Other info

Code

Follow for update