AdapterTune: Zero-Initialized Low-Rank Adapters for Frozen Vision Transformers
About
Frozen-backbone transfer with Vision Transformers faces two under-addressed issues: optimization instability when adapters are naively inserted into a fixed feature extractor, and the absence of principled guidance for setting adapter capacity. We introduce AdapterTune, which augments each transformer block with a residual low-rank bottleneck whose up-projection is zero-initialized, guaranteeing that the adapted network starts exactly at the pretrained function and eliminates early-epoch representation drift. On the analytical side, we formalize adapter rank as a capacity budget for approximating downstream task shifts in feature space. The resulting excess-risk decomposition predicts monotonic but diminishing accuracy gains with increasing rank, an ``elbow'' behavior we confirm through controlled sweeps. We evaluate on 9 datasets and 3 backbone scales with multi-seed reporting throughout. On a core 5 dataset transfer suite, AdapterTune improves top-1 accuracy over head-only transfer by +14.9 points on average while training only 0.92 of the parameters required by full fine-tuning, and outperforms full fine-tuning on 10 of 15 dataset-backbone pairs. Across the full benchmark, AdapterTune improves over head-only transfer on every dataset-backbone pair tested. Ablations on rank, placement, and initialization isolate each design choice. The code is available at: https://github.com/salimkhazem/adaptertune
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | Flowers102 | Accuracy99.43 | 558 | |
| Image Classification | Food-101 | -- | 542 | |
| Image Classification | CIFAR-10 | Accuracy98.9 | 508 | |
| Image Classification | Tiny-ImageNet | Top-1 Accuracy90 | 230 | |
| Image Classification | ImageNet-R | -- | 217 | |
| Image Classification | CIFAR-100 | Accuracy91.2 | 117 | |
| Image Classification | FGVC Aircraft | Top-1 Acc74.79 | 92 | |
| Image Classification | Oxford-IIIT Pet | Top-1 Accuracy94.3 | 55 | |
| Image Classification | SVHN | Accuracy97.5 | 43 |