Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR

About

Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance. Its success cannot be achieved without the re-introduction of multi-scale feature fusion in the encoder. However, the excessively increased tokens in multi-scale features, especially for about 75\% of low-level features, are quite computationally inefficient, which hinders real applications of DETR models. In this paper, we present Lite DETR, a simple yet efficient end-to-end object detection framework that can effectively reduce the GFLOPs of the detection head by 60\% while keeping 99\% of the original performance. Specifically, we design an efficient encoder block to update high-level features (corresponding to small-resolution feature maps) and low-level features (corresponding to large-resolution feature maps) in an interleaved way. In addition, to better fuse cross-scale features, we develop a key-aware deformable attention to predict more reliable attention weights. Comprehensive experiments validate the effectiveness and efficiency of the proposed Lite DETR, and the efficient encoder strategy can generalize well across existing DETR-based models. The code will be available in \url{https://github.com/IDEA-Research/Lite-DETR}.

Feng Li, Ailing Zeng, Shilong Liu, Hao Zhang, Hongyang Li, Lei Zhang, Lionel M. Ni• 2023

Related benchmarks

TaskDatasetResultRank
Instance SegmentationCOCO 2017 (val)--
1201
Panoptic SegmentationCOCO 2017 (val)
PQ52.7
185
Semantic segmentationCOCO 2017 (val)
mIoU63.08
66
Panoptic SegmentationCityscapes
PQ62.29
32
Object DetectionBUSI
AP@0.5 (BN)74.5
19
Object DetectionThyroid II
AP@0.5 (BN)92.8
19
Object DetectionTN3K
AP50.8
19
Object DetectionThyroid I (test)
AP@0.5 (BN)0.911
19
Showing 8 of 8 rows

Other info

Follow for update