Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach

About

Parameter-efficient fine-tuning for pre-trained Vision Transformers aims to adeptly tailor a model to downstream tasks by learning a minimal set of new adaptation parameters while preserving the frozen majority of pre-trained parameters. Striking a balance between retaining the generalizable representation capacity of the pre-trained model and acquiring task-specific features poses a key challenge. Currently, there is a lack of focus on guiding this delicate trade-off. In this study, we approach the problem from the perspective of Singular Value Decomposition (SVD) of pre-trained parameter matrices, providing insights into the tuning dynamics of existing methods. Building upon this understanding, we propose a Residual-based Low-Rank Rescaling (RLRR) fine-tuning strategy. This strategy not only enhances flexibility in parameter tuning but also ensures that new parameters do not deviate excessively from the pre-trained model through a residual design. Extensive experiments demonstrate that our method achieves competitive performance across various downstream image classification tasks, all while maintaining comparable new parameters. We believe this work takes a step forward in offering a unified perspective for interpreting existing methods and serves as motivation for the development of new approaches that move closer to effectively considering the crucial trade-off mentioned above. Our code is available at \href{https://github.com/zstarN70/RLRR.git}{https://github.com/zstarN70/RLRR.git}.

Wei Dong, Xing Zhang, Bihui Chen, Dawei Yan, Zhijun Lin, Qingsen Yan, Peng Wang, Yang Yang• 2024

Related benchmarks

TaskDatasetResultRank
Fine-grained Image ClassificationStanford Cars (test)
Accuracy90.4
348
Image ClassificationVTAB 1K
Overall Mean Accuracy76.7
204
Fine-grained visual classificationNABirds (test)
Top-1 Accuracy85.3
157
Image ClassificationVTAB 1k (test)
Accuracy (Natural)83.9
121
Image ClassificationVTAB-1K 1.0 (test)
Natural Accuracy83.9
102
Fine-grained visual classificationCUB-200-2011 (test)
Top-1 Acc0.898
70
Fine-grained visual classificationStanford Dogs (test)
Top-1 Acc92
52
Fine-grained Visual CategorizationFGVC (CUB-200-2011, NABirds, Oxford Flowers, Stanford Cars, Stanford Dogs) (test)
CUB-200-2011 Accuracy89.8
32
Image ClassificationFGVC
Average Accuracy91
28
Fine-grained Image ClassificationOxford Flowers (test)
Top-1 Accuracy99.6
24
Showing 10 of 10 rows

Other info

Code

Follow for update