Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning

About

Visual Parameter-Efficient Fine-Tuning (PEFT) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty. However, existing PEFT methods introduce trainable parameters to the same positions across different tasks depending solely on human heuristics and neglect the domain gaps. To this end, we study where to introduce and how to allocate trainable parameters by proposing a novel Sensitivity-aware visual Parameter-efficient fine-Tuning (SPT) scheme, which adaptively allocates trainable parameters to task-specific important positions given a desired tunable parameter budget. Specifically, our SPT first quickly identifies the sensitive parameters that require tuning for a given task in a data-dependent way. Next, our SPT further boosts the representational capability for the weight matrices whose number of sensitive parameters exceeds a pre-defined threshold by utilizing existing structured tuning methods, e.g., LoRA [23] or Adapter [22], to replace directly tuning the selected sensitive parameters (unstructured tuning) under the budget. Extensive experiments on a wide range of downstream recognition tasks show that our SPT is complementary to the existing PEFT methods and largely boosts their performance, e.g., SPT improves Adapter with supervised pre-trained ViT-B/16 backbone by 4.2% and 1.4% mean Top-1 accuracy, reaching SOTA performance on FGVC and VTAB-1k benchmarks, respectively. Source code is at https://github.com/ziplab/SPT

Haoyu He, Jianfei Cai, Jing Zhang, Dacheng Tao, Bohan Zhuang• 2023

Related benchmarks

Task	Dataset	Result
Semantic segmentation	ADE20K (val)	mIoU45.4	3069
Image Classification	ImageNet-R	Top-1 Acc72.6	581
Image Classification	Food-101	--	570
Image Classification	SVHN	--	395
Image Classification	CIFAR-100	--	302
Image Classification	VTAB 1K	Overall Mean Accuracy76.4	281
Image Classification	VTAB-1K 1.0 (test)	Natural Accuracy76.6	102
Visual Task Adaptation	VTAB 1K	Average Accuracy76.4	78
Fine-grained Visual Categorization	FGVC	Mean Accuracy90.1	40
Visual Task Adaptation	VTAB-1k v1 (test)	Mean Accuracy78.7	34

Showing 10 of 24 rows

Other info

Follow for update

@wizwand_team Discord