Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

About

We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer a deeper understanding of the role of queries in DETR. This new formulation directly uses box coordinates as queries in Transformer decoders and dynamically updates them layer-by-layer. Using box coordinates not only helps using explicit positional priors to improve the query-to-feature similarity and eliminate the slow training convergence issue in DETR, but also allows us to modulate the positional attention map using the box width and height information. Such a design makes it clear that queries in DETR can be implemented as performing soft ROI pooling layer-by-layer in a cascade manner. As a result, it leads to the best performance on MS-COCO benchmark among the DETR-like detection models under the same setting, e.g., AP 45.7\% using ResNet50-DC5 as backbone trained in 50 epochs. We also conducted extensive experiments to confirm our analysis and verify the effectiveness of our methods. Code is available at \url{https://github.com/SlongLiu/DAB-DETR}.

Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang• 2022

Related benchmarks

TaskDatasetResultRank
Object DetectionCOCO 2017 (val)
AP49
2454
Object DetectionCOCO (val)
mAP46.9
613
Video Object DetectionImageNet VID (val)
mAP (%)54.2
341
Object DetectionMS-COCO 2017 (val)--
237
Object DetectionCOCO (minival)
mAP38
184
Object DetectionAI-TOD (test)
AP@0.542.6
88
Object DetectionPascal VOC
mAP57.9
88
Object DetectionDOTA v1.5
mAP23.3
37
Object DetectionSARDet-100K (test)
MAP44.04
27
Object DetectionSAR-Aircraft v1.0 (test)
mAP (AP'07)62.25
27
Showing 10 of 38 rows

Other info

Code

Follow for update