D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement

About

We introduce D-FINE, a powerful real-time object detector that achieves outstanding localization precision by redefining the bounding box regression task in DETR models. D-FINE comprises two key components: Fine-grained Distribution Refinement (FDR) and Global Optimal Localization Self-Distillation (GO-LSD). FDR transforms the regression process from predicting fixed coordinates to iteratively refining probability distributions, providing a fine-grained intermediate representation that significantly enhances localization accuracy. GO-LSD is a bidirectional optimization strategy that transfers localization knowledge from refined distributions to shallower layers through self-distillation, while also simplifying the residual prediction tasks for deeper layers. Additionally, D-FINE incorporates lightweight optimizations in computationally intensive modules and operations, achieving a better balance between speed and accuracy. Specifically, D-FINE-L / X achieves 54.0% / 55.8% AP on the COCO dataset at 124 / 78 FPS on an NVIDIA T4 GPU. When pretrained on Objects365, D-FINE-L / X attains 57.1% / 59.3% AP, surpassing all existing real-time detectors. Furthermore, our method significantly enhances the performance of a wide range of DETR models by up to 5.3% AP with negligible extra parameters and training costs. Our code and pretrained models: https://github.com/Peterande/D-FINE.

Yansong Peng, Hebei Li, Peixi Wu, Yueyi Zhang, Xiaoyan Sun, Feng Wu• 2024

Related benchmarks

Task	Dataset	Result
Object Detection	COCO 2017 (val)	AP59.3	2843
Object Detection	VisDrone 2019 (val)	AP@0.551.5	50
Object Detection	HRSID (test)	mAP5088.2	11
Object Detection	SAR-Ship Dataset	mAP5097.2	10
Object Detection	SpaceNet	mAP50:9574.3	7
Object Detection	NightDVS22	mAP47.8	2

Showing 6 of 6 rows

Other info

Code

Follow for update

@wizwand_team Discord