StyLIP: Multi-Scale Style-Conditioned Prompt Learning for CLIP-based Domain Generalization

About

Large-scale foundation models, such as CLIP, have demonstrated impressive zero-shot generalization performance on downstream tasks, leveraging well-designed language prompts. However, these prompt learning techniques often struggle with domain shift, limiting their generalization capabilities. In our study, we tackle this issue by proposing StyLIP, a novel approach for Domain Generalization (DG) that enhances CLIP's classification performance across domains. Our method focuses on a domain-agnostic prompt learning strategy, aiming to disentangle the visual style and content information embedded in CLIP's pre-trained vision encoder, enabling effortless adaptation to novel domains during inference. To achieve this, we introduce a set of style projectors that directly learn the domain-specific prompt tokens from the extracted multi-scale style features. These generated prompt embeddings are subsequently combined with the multi-scale visual content features learned by a content projector. The projectors are trained in a contrastive manner, utilizing CLIP's fixed vision and text backbones. Through extensive experiments conducted in five different DG settings on multiple benchmark datasets, we consistently demonstrate that StyLIP outperforms the current state-of-the-art (SOTA) methods.

Shirsha Bose, Ankit Jha, Enrico Fini, Mainak Singha, Elisa Ricci, Biplab Banerjee• 2023

Related benchmarks

Task	Dataset	Result
Domain Generalization	VLCS	Accuracy87.21	347
Domain Generalization	PACS	Accuracy98.17	323
Image Classification	PACS	Overall Average Accuracy92.5	299
Domain Generalization	Digits-DG	Accuracy96.73	79
Image Classification	VLCS	Accuracy (Caltech101)99.46	66
Domain Generalization	ImageNet variants (V2, S, A, R) (test)	ImageNet-V2 Accuracy56.6	57
Domain Generalization	Office-Home	Overall Average Accuracy85.94	34
Domain Generalization	OfficeHome	Accuracy (Art Domain)84.93	29
Domain Generalization	DomainNet Mini	Accuracy80.43	27
Open Set Domain Generalization	OfficeHome H=1	Accuracy52.34	23

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord