RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer

About

In this report, we present RT-DETRv2, an improved Real-Time DEtection TRansformer (RT-DETR). RT-DETRv2 builds upon the previous state-of-the-art real-time detector, RT-DETR, and opens up a set of bag-of-freebies for flexibility and practicality, as well as optimizing the training strategy to achieve enhanced performance. To improve the flexibility, we suggest setting a distinct number of sampling points for features at different scales in the deformable attention to achieve selective multi-scale feature extraction by the decoder. To enhance practicality, we propose an optional discrete sampling operator to replace the grid_sample operator that is specific to RT-DETR compared to YOLOs. This removes the deployment constraints typically associated with DETRs. For the training strategy, we propose dynamic data augmentation and scale-adaptive hyperparameters customization to improve performance without loss of speed. Source code and pre-trained models will be available at https://github.com/lyuwenyu/RT-DETR.

Wenyu Lv, Yian Zhao, Qinyao Chang, Kui Huang, Guanzhong Wang, Yi Liu• 2024

Related benchmarks

Task	Dataset	Result
Object Detection	COCO 2017 (val)	AP54.3	2843
Object Detection	VisDrone 2019 (val)	AP@0.549.1	50
Object Detection	CS-positive	mAP38	25
Attention Heatmap Prediction	SurgAtt-SZPH (test)	NSS2.37	18
Object Detection	Raw UAV Imagery Patches (test)	mAP62.3	14
Object Detection	Orthomosaic UAV Imagery (test)	mAP62.3	14
Dense GUI Parsing	GroundCUA full benchmark	Page IoU38.8	10
Dense Parsing	ScreenParse (test)	Page IoU60	10
Attention Heatmap Prediction	AutoLaparo SurgAtt	NSS2.715	9
Attention Heatmap Prediction	SurgAtt-Hamlyn	NSS1.744	9

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord