Semi-DETR: Semi-Supervised Object Detection with Detection Transformers

About

We analyze the DETR-based framework on semi-supervised object detection (SSOD) and observe that (1) the one-to-one assignment strategy generates incorrect matching when the pseudo ground-truth bounding box is inaccurate, leading to training inefficiency; (2) DETR-based detectors lack deterministic correspondence between the input query and its prediction output, which hinders the applicability of the consistency-based regularization widely used in current SSOD methods. We present Semi-DETR, the first transformer-based end-to-end semi-supervised object detector, to tackle these problems. Specifically, we propose a Stage-wise Hybrid Matching strategy that combines the one-to-many assignment and one-to-one assignment strategies to improve the training efficiency of the first stage and thus provide high-quality pseudo labels for the training of the second stage. Besides, we introduce a Crossview Query Consistency method to learn the semantic feature invariance of object queries from different views while avoiding the need to find deterministic query correspondence. Furthermore, we propose a Cost-based Pseudo Label Mining module to dynamically mine more pseudo boxes based on the matching cost of pseudo ground truth bounding boxes for consistency training. Extensive experiments on all SSOD settings of both COCO and Pascal VOC benchmark datasets show that our Semi-DETR method outperforms all state-of-the-art methods by clear margins. The PaddlePaddle version code1 is at https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/semi_det/semi_detr.

Jiacheng Zhang, Xiangru Lin, Wei Zhang, Kuo Wang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li• 2023

Related benchmarks

Task	Dataset	Result
Object Detection	Pascal VOC	mAP86.1	126
Text Detection	Total-Text (test)	F-Measure84	126
Text Detection	ICDAR 2015 (test)	F1 Score87.6	108
Scene Text Spotting	Total-Text (test)	F-measure (None)81.7	105
Object Detection	COCO standard (5% labeled)	mAP40.1	70
End-to-End Text Spotting	ICDAR 2015 (test)	Generic F-measure71.1	62
Object Detection	COCO standard (10%)	mAP43.5	54
Object Detection	COCO standard (1%)	--	44
Text Spotting	ICDAR 2015 (test)	Accuracy (Strong Lexicon)85.4	36
Object Detection	COCO 1% labeled 2017 (val train)	mAP30.5	30

Showing 10 of 14 rows

Other info

Code

Follow for update

@wizwand_team Discord