Contrastive Mean Teacher for Domain Adaptive Object Detectors

About

Object detectors often suffer from the domain gap between training (source domain) and real-world applications (target domain). Mean-teacher self-training is a powerful paradigm in unsupervised domain adaptation for object detection, but it struggles with low-quality pseudo-labels. In this work, we identify the intriguing alignment and synergy between mean-teacher self-training and contrastive learning. Motivated by this, we propose Contrastive Mean Teacher (CMT) -- a unified, general-purpose framework with the two paradigms naturally integrated to maximize beneficial learning signals. Instead of using pseudo-labels solely for final predictions, our strategy extracts object-level features using pseudo-labels and optimizes them via contrastive learning, without requiring labels in the target domain. When combined with recent mean-teacher self-training methods, CMT leads to new state-of-the-art target-domain performance: 51.9% mAP on Foggy Cityscapes, outperforming the previously best by 2.1% mAP. Notably, CMT can stabilize performance and provide more significant gains as pseudo-label noise increases.

Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, Yu-Xiong Wang• 2023

Related benchmarks

Task	Dataset	Result
Object Detection	Cityscapes to Foggy Cityscapes (test)	mAP50.3	196
Object Detection	Foggy Cityscapes (test)	AP (Person)45.9	134
Object Detection	Pascal VOC -> Clipart (test)	mAP47	91
Object Detection	Foggy Cityscapes	mAP50.3	60
Object Detection	KITTI to Cityscapes	AP (Car)64.3	42
Object Detection	Foggy Cityscapes 0.02 (test)	AP (person)45.9	35
Object Detection	Clipart (test)	mAP47	33
Object Detection	RainCityscape	AP0.521	24
Object Detection	Clipart, Comic, and Watercolor	mAP (Clipart)47	22
Object Detection	Cityscapes to Foggy Cityscapes severity 0.02 1.0 (val)	AP (Person)45.9	22

Showing 10 of 17 rows

Other info

Code

Follow for update

@wizwand_team Discord