Prompt Vision Transformer for Domain Generalization

About

Though vision transformers (ViTs) have exhibited impressive ability for representation learning, we empirically find that they cannot generalize well to unseen domains with previous domain generalization algorithms. In this paper, we propose a novel approach DoPrompt based on prompt learning to embed the knowledge of source domains in domain prompts for target domain prediction. Specifically, domain prompts are prepended before ViT input tokens from the corresponding source domain. Each domain prompt learns domain-specific knowledge efficiently since it is optimized only for one domain. Meanwhile, we train a prompt adapter to produce a suitable prompt for each input image based on the learned source domain prompts. At test time, the adapted prompt generated by the prompt adapter can exploit the similarity between the feature of the out-of-domain image and source domains to properly integrate the source domain knowledge. Extensive experiments are conducted on four benchmark datasets. Our approach achieves 1.4% improvements in the averaged accuracy, which is 3.5 times the improvement of the state-of-the-art algorithm with a ViT backbone.

Zangwei Zheng, Xiangyu Yue, Kai Wang, Yang You• 2022

Related benchmarks

Task	Dataset	Result
Image Classification	DomainNet	Accuracy (ClipArt)67.6	238
Domain Generalization	DomainNet	Clipart Accuracy67.7	16
Skin lesion classification	PH2 (out-of-distribution)	ROC AUC0.9133	12
Skin lesion classification	Aggregate (Derm7pt, PAD, PH2) (out-of-distribution)	Avg ROC-AUC82.06	12
Skin lesion classification	derm7pt dermoscopic (out-of-distribution)	ROC AUC0.8238	12
Skin lesion classification	PAD (out-of-distribution)	ROC AUC83.81	12
Skin lesion classification	derm7pt clinical (out-of-distribution)	ROC AUC0.7161	12

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord