Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game Perspective

About

Endeavors have been recently made to leverage the vision transformer (ViT) for the challenging unsupervised domain adaptation (UDA) task. They typically adopt the cross-attention in ViT for direct domain alignment. However, as the performance of cross-attention highly relies on the quality of pseudo labels for targeted samples, it becomes less effective when the domain gap becomes large. We solve this problem from a game theory's perspective with the proposed model dubbed as PMTrans, which bridges source and target domains with an intermediate domain. Specifically, we propose a novel ViT-based module called PatchMix that effectively builds up the intermediate domain, i.e., probability distribution, by learning to sample patches from both domains based on the game-theoretical models. This way, it learns to mix the patches from the source and target domains to maximize the cross entropy (CE), while exploiting two semi-supervised mixup losses in the feature and label spaces to minimize it. As such, we interpret the process of UDA as a min-max CE game with three players, including the feature extractor, classifier, and PatchMix, to find the Nash Equilibria. Moreover, we leverage attention maps from ViT to re-weight the label of each patch by its importance, making it possible to obtain more domain-discriminative feature representations. We conduct extensive experiments on four benchmark datasets, and the results show that PMTrans significantly surpasses the ViT-based and CNN-based SoTA methods by +3.6% on Office-Home, +1.4% on Office-31, and +17.7% on DomainNet, respectively.

Jinjing Zhu, Haotian Bai, Lin Wang• 2023

Related benchmarks

TaskDatasetResultRank
Unsupervised Domain AdaptationOffice-Home (test)
Average Accuracy89
332
Unsupervised Domain AdaptationOffice-Home
Average Accuracy89
238
Image ClassificationDomainNet
Accuracy (ClipArt)74.1
161
Unsupervised Domain AdaptationDomainNet
Average Accuracy54.5
100
Unsupervised Domain AdaptationDomainNet (test)
Average Accuracy52.4
97
Object ClassificationVisDA synthetic-to-real 2017
Mean Accuracy88
91
Unsupervised Domain AdaptationOffice-31
A->W Accuracy99.5
83
Image ClassificationVisDA 2017 (test)
Class Accuracy (Plane)98.2
83
Unsupervised Domain Adaptation ClassificationOffice-31 (test)
Accuracy (A->W)99.1
51
Unsupervised Domain AdaptationVisDA 2017 (test)
Plane Accuracy99.4
27
Showing 10 of 14 rows

Other info

Follow for update