Contrastive Mean Teacher for Domain Adaptive Object Detectors
About
Object detectors often suffer from the domain gap between training (source domain) and real-world applications (target domain). Mean-teacher self-training is a powerful paradigm in unsupervised domain adaptation for object detection, but it struggles with low-quality pseudo-labels. In this work, we identify the intriguing alignment and synergy between mean-teacher self-training and contrastive learning. Motivated by this, we propose Contrastive Mean Teacher (CMT) -- a unified, general-purpose framework with the two paradigms naturally integrated to maximize beneficial learning signals. Instead of using pseudo-labels solely for final predictions, our strategy extracts object-level features using pseudo-labels and optimizes them via contrastive learning, without requiring labels in the target domain. When combined with recent mean-teacher self-training methods, CMT leads to new state-of-the-art target-domain performance: 51.9% mAP on Foggy Cityscapes, outperforming the previously best by 2.1% mAP. Notably, CMT can stabilize performance and provide more significant gains as pseudo-label noise increases.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Detection | Cityscapes to Foggy Cityscapes (test) | mAP50.3 | 196 | |
| Object Detection | Foggy Cityscapes (test) | mAP (Mean Average Precision)50.3 | 108 | |
| Object Detection | Pascal VOC -> Clipart (test) | mAP47 | 78 | |
| Object Detection | Foggy Cityscapes | mAP50.3 | 47 | |
| Object Detection | KITTI to Cityscapes | AP (Car)64.3 | 42 | |
| Object Detection | RainCityscape | AP0.521 | 24 | |
| Object Detection | Clipart, Comic, and Watercolor | mAP (Clipart)47 | 22 | |
| Object Detection | Clipart (test) | mAP47 | 22 | |
| Object Detection | Cityscapes to Foggy Cityscapes severity 0.02 1.0 (val) | AP (Person)45.9 | 22 | |
| Object Detection | Clipart1k 1.0 (test) | aero AP39.8 | 21 |