Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

On the Surprising Effectiveness of Attention Transfer for Vision Transformers

About

Conventional wisdom suggests that pre-training Vision Transformers (ViT) improves downstream performance by learning useful representations. Is this actually true? We investigate this question and find that the features and representations learned during pre-training are not essential. Surprisingly, using only the attention patterns from pre-training (i.e., guiding how information flows between tokens) is sufficient for models to learn high quality features from scratch and achieve comparable downstream performance. We show this by introducing a simple method called attention transfer, where only the attention patterns from a pre-trained teacher ViT are transferred to a student, either by copying or distilling the attention maps. Since attention transfer lets the student learn its own features, ensembling it with a fine-tuned teacher also further improves accuracy on ImageNet. We systematically study various aspects of our findings on the sufficiency of attention maps, including distribution shift settings where they underperform fine-tuning. We hope our exploration provides a better understanding of what pre-training accomplishes and leads to a useful alternative to the standard practice of fine-tuning

Alexander C. Li, Yuandong Tian, Beidi Chen, Deepak Pathak, Xinlei Chen• 2024

Related benchmarks

TaskDatasetResultRank
Object DetectionCOCO 2017 (val)--
2454
Image ClassificationImageNet-1K 1.0 (val)
Top-1 Accuracy86.3
1866
Instance SegmentationCOCO 2017 (val)--
1144
Image ClassificationImageNet A
Top-1 Acc54.3
553
Image ClassificationImageNet V2--
487
Image ClassificationImageNet-R
Accuracy57.5
148
Image ClassificationImageNet-S
Top-1 Acc43.1
43
Long-tailed Visual RecognitioniNaturalist 2017 (test)
Accuracy69.3
16
Long-tailed recognitioniNaturalist 2018--
7
Long-tail recognitioniNat 2019
Accuracy80
4
Showing 10 of 10 rows

Other info

Code

Follow for update