Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Efficient Adaptation of Large Vision Transformer via Adapter Re-Composing

About

The advent of high-capacity pre-trained models has revolutionized problem-solving in computer vision, shifting the focus from training task-specific models to adapting pre-trained models. Consequently, effectively adapting large pre-trained models to downstream tasks in an efficient manner has become a prominent research area. Existing solutions primarily concentrate on designing lightweight adapters and their interaction with pre-trained models, with the goal of minimizing the number of parameters requiring updates. In this study, we propose a novel Adapter Re-Composing (ARC) strategy that addresses efficient pre-trained model adaptation from a fresh perspective. Our approach considers the reusability of adaptation parameters and introduces a parameter-sharing scheme. Specifically, we leverage symmetric down-/up-projections to construct bottleneck operations, which are shared across layers. By learning low-dimensional re-scaling coefficients, we can effectively re-compose layer-adaptive adapters. This parameter-sharing strategy in adapter design allows us to significantly reduce the number of new parameters while maintaining satisfactory performance, thereby offering a promising approach to compress the adaptation cost. We conduct experiments on 24 downstream image classification tasks using various Vision Transformer variants to evaluate our method. The results demonstrate that our approach achieves compelling transfer learning performance with a reduced parameter count. Our code is available at \href{https://github.com/DavidYanAnDe/ARC}{https://github.com/DavidYanAnDe/ARC}.

Wei Dong, Dawei Yan, Zhijun Lin, Peng Wang• 2023

Related benchmarks

TaskDatasetResultRank
Image ClassificationStanford Cars
Accuracy89.5
477
Fine-grained Image ClassificationStanford Cars (test)
Accuracy89.5
348
Image ClassificationCUB-200 2011
Accuracy89.3
257
Image ClassificationVTAB 1K
Overall Mean Accuracy74.3
204
Fine-grained visual classificationNABirds (test)
Top-1 Accuracy85.7
157
Image ClassificationStanford Dogs
Accuracy91.9
130
Image ClassificationVTAB 1k (test)
Accuracy (Natural)82.3
121
Image ClassificationVTAB-1K 1.0 (test)
Natural Accuracy82.3
102
Image ClassificationOxford Flowers
Top-1 Accuracy99.7
78
Visual Task AdaptationVTAB 1K
Average Accuracy72.6
78
Showing 10 of 17 rows

Other info

Code

Follow for update