TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation

About

Unsupervised domain adaptation (UDA) aims to transfer the knowledge learnt from a labeled source domain to an unlabeled target domain. Previous work is mainly built upon convolutional neural networks (CNNs) to learn domain-invariant representations. With the recent exponential increase in applying Vision Transformer (ViT) to vision tasks, the capability of ViT in adapting cross-domain knowledge, however, remains unexplored in the literature. To fill this gap, this paper first comprehensively investigates the transferability of ViT on a variety of domain adaptation tasks. Surprisingly, ViT demonstrates superior transferability over its CNNs-based counterparts with a large margin, while the performance can be further improved by incorporating adversarial adaptation. Notwithstanding, directly using CNNs-based adaptation strategies fails to take the advantage of ViT's intrinsic merits (e.g., attention mechanism and sequential image representation) which play an important role in knowledge transfer. To remedy this, we propose an unified framework, namely Transferable Vision Transformer (TVT), to fully exploit the transferability of ViT for domain adaptation. Specifically, we delicately devise a novel and effective unit, which we term Transferability Adaption Module (TAM). By injecting learned transferabilities into attention blocks, TAM compels ViT focus on both transferable and discriminative features. Besides, we leverage discriminative clustering to enhance feature diversity and separation which are undermined during adversarial domain alignment. To verify its versatility, we perform extensive studies of TVT on four benchmarks and the experimental results demonstrate that TVT attains significant improvements compared to existing state-of-the-art UDA methods.

Jinyu Yang, Jingjing Liu, Ning Xu, Junzhou Huang• 2021

Related benchmarks

Task	Dataset	Result
Unsupervised Domain Adaptation	Office-Home (test)	Average Accuracy83.6	332
Image Classification	Office-Home (test)	Mean Accuracy63.3	328
Unsupervised Domain Adaptation	Office-Home	Average Accuracy83.6	279
Image Classification	Office-Home	Average Accuracy83.6	167
Domain Adaptation	Office-31 unsupervised adaptation standard	Accuracy (A to W)96.4	162
Object Classification	VisDA synthetic-to-real 2017	Mean Accuracy83.92	139
Unsupervised Domain Adaptation	VisDA unsupervised domain adaptation 2017	Mean Accuracy83.9	103
Image Classification	VisDA 2017 (test)	Class Accuracy (Plane)94.6	92
Unsupervised Domain Adaptation	Office-31	A->W Accuracy96.4	83
Image Classification	VisDA-C (test)	Mean Accuracy83.9	76

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord