Domain-Specificity Inducing Transformers for Source-Free Domain Adaptation
About
Conventional Domain Adaptation (DA) methods aim to learn domain-invariant feature representations to improve the target adaptation performance. However, we motivate that domain-specificity is equally important since in-domain trained models hold crucial domain-specific properties that are beneficial for adaptation. Hence, we propose to build a framework that supports disentanglement and learning of domain-specific factors and task-specific factors in a unified model. Motivated by the success of vision transformers in several multi-modal vision problems, we find that queries could be leveraged to extract the domain-specific factors. Hence, we propose a novel Domain-specificity-inducing Transformer (DSiT) framework for disentangling and learning both domain-specific and task-specific factors. To achieve disentanglement, we propose to construct novel Domain-Representative Inputs (DRI) with domain-specific information to train a domain classifier with a novel domain token. We are the first to utilize vision transformers for domain adaptation in a privacy-oriented source-free setting, and our approach achieves state-of-the-art performance on single-source, multi-source, and multi-target benchmarks
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | DomainNet | Accuracy (ClipArt)55.3 | 161 | |
| Domain Adaptation | Office-Home (test) | Mean Accuracy80.5 | 112 | |
| Domain Adaptation | OFFICE | Average Accuracy93 | 96 | |
| Domain Adaptation | VisDA-C (test) | S→R Score0.876 | 26 | |
| Closed-set Source-Free Domain Adaptation | Office-Home | Average Accuracy80.5 | 22 | |
| Closed-set Source-Free Domain Adaptation | Office-31 (target) | Average Accuracy93 | 19 |