Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Large Self-Supervised Models Bridge the Gap in Domain Adaptive Object Detection

About

The current state-of-the-art methods in domain adaptive object detection (DAOD) use Mean Teacher self-labelling, where a teacher model, directly derived as an exponential moving average of the student model, is used to generate labels on the target domain which are then used to improve both models in a positive loop. This couples learning and generating labels on the target domain, and other recent works also leverage the generated labels to add additional domain alignment losses. We believe this coupling is brittle and excessively constrained: there is no guarantee that a student trained only on source data can generate accurate target domain labels and initiate the positive feedback loop, and much better target domain labels can likely be generated by using a large pretrained network that has been exposed to much more data. Vision foundational models are exactly such models, and they have shown impressive task generalization capabilities even when frozen. We want to leverage these models for DAOD and introduce DINO Teacher, which consists of two components. First, we train a new labeller on source data only using a large frozen DINOv2 backbone and show it generates more accurate labels than Mean Teacher. Next, we align the student's source and target image patch features with those from a DINO encoder, driving source and target representations closer to the generalizable DINO representation. We obtain state-of-the-art performance on multiple DAOD datasets. Code available at https://github.com/TRAILab/DINO_Teacher

Marc-Antoine Lavoie, Anas Mahmoud, Steven L. Waslander• 2025

Related benchmarks

TaskDatasetResultRank
Object DetectionCityscapes to Foggy Cityscapes (test)
mAP55.4
196
Object DetectionBDD100K (val)
mAP47.8
60
Object DetectionCityscapes → BDD100k
Truck AP44.3
18
Domain Adaptive Object DetectionFoggy Cityscapes (val)
AP (Person)48.5
18
Object DetectionFoggy Cityscapes full (val)
AP (Person)48.5
15
Object DetectionACDC
mAP50 (Fog)68.6
10
Object DetectionACDC Fog (test)
mAP68.6
3
Object DetectionACDC Night (test)
mAP36.4
3
Object DetectionACDC Rain (test)
mAP39
3
Object DetectionACDC Snow (test)
mAP0.568
3
Showing 10 of 11 rows

Other info

Code

Follow for update