Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer

About

In this report, we present RT-DETRv2, an improved Real-Time DEtection TRansformer (RT-DETR). RT-DETRv2 builds upon the previous state-of-the-art real-time detector, RT-DETR, and opens up a set of bag-of-freebies for flexibility and practicality, as well as optimizing the training strategy to achieve enhanced performance. To improve the flexibility, we suggest setting a distinct number of sampling points for features at different scales in the deformable attention to achieve selective multi-scale feature extraction by the decoder. To enhance practicality, we propose an optional discrete sampling operator to replace the grid_sample operator that is specific to RT-DETR compared to YOLOs. This removes the deployment constraints typically associated with DETRs. For the training strategy, we propose dynamic data augmentation and scale-adaptive hyperparameters customization to improve performance without loss of speed. Source code and pre-trained models will be available at https://github.com/lyuwenyu/RT-DETR.

Wenyu Lv, Yian Zhao, Qinyao Chang, Kui Huang, Guanzhong Wang, Yi Liu• 2024

Related benchmarks

TaskDatasetResultRank
Object DetectionCOCO 2017 (val)
AP54.3
2643
Object DetectionCS-positive
mAP38
25
Attention Heatmap PredictionSurgAtt-SZPH (test)
NSS2.37
18
Object DetectionRaw UAV Imagery Patches (test)
mAP62.3
14
Object DetectionOrthomosaic UAV Imagery (test)
mAP62.3
14
Dense GUI ParsingGroundCUA full benchmark
Page IoU38.8
10
Dense ParsingScreenParse (test)
Page IoU60
10
Attention Heatmap PredictionAutoLaparo SurgAtt
NSS2.715
9
Attention Heatmap PredictionSurgAtt-Hamlyn
NSS1.744
9
Object DetectionCurated Building Facade Defect high-resolution images (test)
Precision43.89
7
Showing 10 of 11 rows

Other info

Follow for update