Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

About

In this paper, we present an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects with human inputs such as category names or referring expressions. The key solution of open-set object detection is introducing language to a closed-set detector for open-set concept generalization. To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion. While previous works mainly evaluate open-set object detection on novel categories, we propose to also perform evaluations on referring expression comprehension for objects specified with attributes. Grounding DINO performs remarkably well on all three settings, including benchmarks on COCO, LVIS, ODinW, and RefCOCO/+/g. Grounding DINO achieves a $52.5$ AP on the COCO detection zero-shot transfer benchmark, i.e., without any training data from COCO. It sets a new record on the ODinW zero-shot benchmark with a mean $26.1$ AP. Code will be available at \url{https://github.com/IDEA-Research/GroundingDINO}.

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang• 2023

Related benchmarks

TaskDatasetResultRank
Object DetectionCOCO 2017 (val)
AP63
2454
Object DetectionCOCO (val)
mAP48.4
613
Object DetectionLVIS v1.0 (val)
APbbox32.3
518
Object DetectionCOCO v2017 (test-dev)
mAP63
499
Referring Expression ComprehensionRefCOCO+ (val)
Accuracy82.8
345
Referring Expression ComprehensionRefCOCO (val)
Accuracy90.6
335
Referring Expression ComprehensionRefCOCO (testA)
Accuracy93.19
333
Object CountingFSC-147 (test)
MAE59.23
297
Referring Expression ComprehensionRefCOCOg (test)
Accuracy87.02
291
Referring Expression ComprehensionRefCOCOg (val)
Accuracy86.13
291
Showing 10 of 179 rows
...

Other info

Code

Follow for update