CLIP the Gap: A Single Domain Generalization Approach for Object Detection
About
Single Domain Generalization (SDG) tackles the problem of training a model on a single source domain so that it generalizes to any unseen target domain. While this has been well studied for image classification, the literature on SDG object detection remains almost non-existent. To address the challenges of simultaneously learning robust object localization and representation, we propose to leverage a pre-trained vision-language model to introduce semantic domain concepts via textual prompts. We achieve this via a semantic augmentation strategy acting on the features extracted by the detector backbone, as well as a text-based classification loss. Our experiments evidence the benefits of our approach, outperforming by 10% the only existing SDG object detection method, Single-DGOD [49], on their own diverse weather-driving benchmark.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Detection | Watercolor2k (test) | mAP (Overall)33.5 | 113 | |
| Object Detection | Comic2k (test) | mAP43.4 | 62 | |
| Object Detection | Diverse Weather Datasets | DF32 | 27 | |
| Object Detection | Diverse Weather Dataset (DWD) (test) | mAP (Night-sunny)36.9 | 24 | |
| Object Detection | Night Clear | mAP36.9 | 15 | |
| Object Detection | INBreast (adapted from DDSM) (test) | Recall @0.050.15 | 14 | |
| Object Detection | Driving Scenarios Day Foggy | mAP38.5 | 13 | |
| Object Detection | Driving Scenarios Night Sunny | mAP36.9 | 13 | |
| Object Detection | Driving Scenarios Dusk | mAP32.3 | 13 | |
| Object Detection | Driving Scenarios Rainy | mAP18.7 | 13 |