CLIP the Gap: A Single Domain Generalization Approach for Object Detection

About

Single Domain Generalization (SDG) tackles the problem of training a model on a single source domain so that it generalizes to any unseen target domain. While this has been well studied for image classification, the literature on SDG object detection remains almost non-existent. To address the challenges of simultaneously learning robust object localization and representation, we propose to leverage a pre-trained vision-language model to introduce semantic domain concepts via textual prompts. We achieve this via a semantic augmentation strategy acting on the features extracted by the detector backbone, as well as a text-based classification loss. Our experiments evidence the benefits of our approach, outperforming by 10% the only existing SDG object detection method, Single-DGOD [49], on their own diverse weather-driving benchmark.

Vidit Vidit, Martin Engilberge, Mathieu Salzmann• 2023

Related benchmarks

Task	Dataset	Result
Object Detection	Watercolor2k (test)	mAP (Overall)33.5	113
Object Detection	Comic2k (test)	mAP43.4	62
Object Detection	Diverse Weather Datasets	DF32	48
Object Detection	S-DGOD (test)	AP (NC)36.9	27
Object Detection	Night Clear	mAP36.9	24
Object Detection	Diverse Weather Dataset (DWD) (test)	mAP (Night-sunny)36.9	24
Object Detection	Diverse-Weather Night Rainy (target)	mAP18.7	20
Object Detection	DWD (Diverse Weather Dataset)	Night Clear36.9	16
Object Detection	INBreast (adapted from DDSM) (test)	Recall @0.050.15	14
Object Detection	Driving Scenarios Day Foggy	mAP38.5	13

Showing 10 of 43 rows

Other info

Follow for update

@wizwand_team Discord