Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

About

In this paper, we present an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects with human inputs such as category names or referring expressions. The key solution of open-set object detection is introducing language to a closed-set detector for open-set concept generalization. To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion. While previous works mainly evaluate open-set object detection on novel categories, we propose to also perform evaluations on referring expression comprehension for objects specified with attributes. Grounding DINO performs remarkably well on all three settings, including benchmarks on COCO, LVIS, ODinW, and RefCOCO/+/g. Grounding DINO achieves a $52.5$ AP on the COCO detection zero-shot transfer benchmark, i.e., without any training data from COCO. It sets a new record on the ODinW zero-shot benchmark with a mean $26.1$ AP. Code will be available at \url{https://github.com/IDEA-Research/GroundingDINO}.

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang• 2023

Related benchmarks

Task	Dataset	Result
Object Detection	COCO 2017 (val)	AP63	2843
Object Detection	COCO (val)	mAP48.4	637
Object Detection	LVIS v1.0 (val)	APbbox32.3	542
Object Detection	COCO v2017 (test-dev)	mAP63	499
Referring Expression Comprehension	RefCOCO+ (val)	Accuracy82.8	354
Referring Expression Comprehension	RefCOCO (val)	Accuracy90.6	348
Referring Expression Comprehension	RefCOCO (testA)	Accuracy93.19	346
Object Detection	COCO 2017	AP (Box)62.6	345
Reasoning Segmentation	ReasonSeg (val)	gIoU26	327
Object Counting	FSC-147 (test)	MAE59.23	322

Showing 10 of 321 rows

...

Other info

Code

Follow for update

@wizwand_team Discord