PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization

About

In a joint vision-language space, a text feature (e.g., from "a photo of a dog") could effectively represent its relevant image features (e.g., from dog photos). Also, a recent study has demonstrated the cross-modal transferability phenomenon of this joint space. From these observations, we propose PromptStyler which simulates various distribution shifts in the joint space by synthesizing diverse styles via prompts without using any images to deal with source-free domain generalization. The proposed method learns to generate a variety of style features (from "a S* style of a") via learnable style word vectors for pseudo-words S*. To ensure that learned styles do not distort content information, we force style-content features (from "a S* style of a [class]") to be located nearby their corresponding content features (from "[class]") in the joint vision-language space. After learning style word vectors, we train a linear classifier using synthesized style-content features. PromptStyler achieves the state of the art on PACS, VLCS, OfficeHome and DomainNet, even though it does not require any images for training.

Junhyeong Cho, Gilhyun Nam, Sungyeon Kim, Hunmin Yang, Suha Kwak• 2023

Related benchmarks

Task	Dataset	Result
Image Classification	DomainNet	Accuracy (ClipArt)73.1	238
Domain Generalization	PACS, VLCS, OfficeHome, and DomainNet (test)	PACS Accuracy98.6	28
Image Classification	Terra-Incognita (test)	Accuracy30.5	25
Image Classification	Average	Accuracy49.4	24
Tactile Recognition	Tactile Cross-Domain OF Real to X Unseen target domains	Average ACC48.9	22
Image Classification	OF B 2.0	Accuracy44.7	12
Image Classification	OF A 2.0	Accuracy50.7	12
Image Classification	OF Real	Accuracy55.7	12
Image Classification	TQ-DIGIT	Accuracy51	12
Image Classification	TQ-DuraGel	Accuracy52.6	12

Showing 10 of 22 rows

Other info

Code

Follow for update

@wizwand_team Discord