1%>100%: High-Efficiency Visual Adapter with Complex Linear Projection Optimization
About
Deploying vision foundation models typically relies on efficient adaptation strategies, whereas conventional full fine-tuning suffers from prohibitive costs and low efficiency. While delta-tuning has proven effective in boosting the performance and efficiency of LLMs during adaptation, its advantages cannot be directly transferred to the fine-tuning pipeline of vision foundation models. To push the boundaries of adaptation efficiency for vision tasks, we propose an adapter with Complex Linear Projection Optimization (CoLin). For architecture, we design a novel low-rank complex adapter that introduces only about 1% parameters to the backbone. For efficiency, we theoretically prove that low-rank composite matrices suffer from severe convergence issues during training, and address this challenge with a tailored loss. Extensive experiments on object detection, segmentation, image classification, and rotated object detection (remote sensing scenario) demonstrate that CoLin outperforms both full fine-tuning and classical delta-tuning approaches with merely 1% parameters for the first time, providing a novel and efficient solution for deployment of vision foundation models. We release the code on https://github.com/DongshuoYin/CoLin.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | ADE20K (val) | mIoU51.28 | 2731 | |
| Image Classification | Flowers102 (test) | Accuracy99.6619 | 68 | |
| Oriented Object Detection | STAR (test) | AP39.22 | 60 | |
| Rotated Object Detection | DOTA 1.0 (test) | mAP78.39 | 46 | |
| Object Detection | Pascal VOC (test) | mAP87.5 | 18 | |
| Instance Segmentation | COCO | APMask45.5 | 10 | |
| Object Detection | COCO | APBox52.9 | 10 |