DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR
About
We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer a deeper understanding of the role of queries in DETR. This new formulation directly uses box coordinates as queries in Transformer decoders and dynamically updates them layer-by-layer. Using box coordinates not only helps using explicit positional priors to improve the query-to-feature similarity and eliminate the slow training convergence issue in DETR, but also allows us to modulate the positional attention map using the box width and height information. Such a design makes it clear that queries in DETR can be implemented as performing soft ROI pooling layer-by-layer in a cascade manner. As a result, it leads to the best performance on MS-COCO benchmark among the DETR-like detection models under the same setting, e.g., AP 45.7\% using ResNet50-DC5 as backbone trained in 50 epochs. We also conducted extensive experiments to confirm our analysis and verify the effectiveness of our methods. Code is available at \url{https://github.com/SlongLiu/DAB-DETR}.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Detection | COCO 2017 (val) | AP49 | 2454 | |
| Object Detection | COCO (val) | mAP46.9 | 613 | |
| Video Object Detection | ImageNet VID (val) | mAP (%)54.2 | 341 | |
| Object Detection | MS-COCO 2017 (val) | -- | 237 | |
| Object Detection | COCO (minival) | mAP38 | 184 | |
| Object Detection | AI-TOD (test) | AP@0.542.6 | 88 | |
| Object Detection | Pascal VOC | mAP57.9 | 88 | |
| Object Detection | DOTA v1.5 | mAP23.3 | 37 | |
| Object Detection | SARDet-100K (test) | MAP44.04 | 27 | |
| Object Detection | SAR-Aircraft v1.0 (test) | mAP (AP'07)62.25 | 27 |