FacT: Factor-Tuning for Lightweight Adaptation on Vision Transformer
About
Recent work has explored the potential to adapt a pre-trained vision transformer (ViT) by updating only a few parameters so as to improve storage efficiency, called parameter-efficient transfer learning (PETL). Current PETL methods have shown that by tuning only 0.5% of the parameters, ViT can be adapted to downstream tasks with even better performance than full fine-tuning. In this paper, we aim to further promote the efficiency of PETL to meet the extreme storage constraint in real-world applications. To this end, we propose a tensorization-decomposition framework to store the weight increments, in which the weights of each ViT are tensorized into a single 3D tensor, and their increments are then decomposed into lightweight factors. In the fine-tuning process, only the factors need to be updated and stored, termed Factor-Tuning (FacT). On VTAB-1K benchmark, our method performs on par with NOAH, the state-of-the-art PETL method, while being 5x more parameter-efficient. We also present a tiny version that only uses 8K (0.01% of ViT's parameters) trainable parameters but outperforms full fine-tuning and many other PETL methods such as VPT and BitFit. In few-shot settings, FacT also beats all PETL baselines using the fewest parameters, demonstrating its strong capability in the low-data regime.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Super-resolution | Manga109 | PSNR25.14 | 656 | |
| Image Super-resolution | Set5 (test) | PSNR32.71 | 544 | |
| Text-to-Image Retrieval | Flickr30K | R@159.3 | 460 | |
| Text-to-Video Retrieval | MSR-VTT | Recall@138.7 | 313 | |
| Image Super-resolution | Set14 (test) | PSNR29.03 | 292 | |
| Image Super-resolution | Manga109 (test) | PSNR31.7 | 233 | |
| Super-Resolution | Urban100 (test) | PSNR27.23 | 205 | |
| Image Classification | VTAB 1K | Overall Mean Accuracy75.6 | 204 | |
| Video-to-Text retrieval | MSR-VTT | Recall@139.8 | 157 | |
| Image Classification | VTAB 1k (test) | Accuracy (Natural)80.6 | 121 |