LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors

About

We present a simple self-supervised method to enhance the performance of ViT features for dense downstream tasks. Our Lightweight Feature Transform (LiFT) is a straightforward and compact postprocessing network that can be applied to enhance the features of any pre-trained ViT backbone. LiFT is fast and easy to train with a self-supervised objective, and it boosts the density of ViT features for minimal extra inference cost. Furthermore, we demonstrate that LiFT can be applied with approaches that use additional task-specific downstream modules, as we integrate LiFT with ViTDet for COCO detection and segmentation. Despite the simplicity of LiFT, we find that it is not simply learning a more complex version of bilinear interpolation. Instead, our LiFT training protocol leads to several desirable emergent properties that benefit ViT features in dense downstream tasks. This includes greater scale invariance for features, and better object boundary maps. By simply training LiFT for a few epochs, we show improved performance on keypoint correspondence, detection, segmentation, and object discovery tasks. Overall, LiFT provides an easy way to unlock the benefits of denser feature arrays for a fraction of the computational cost. For more details, refer to our project page at https://www.cs.umd.edu/~sakshams/LiFT/.

Saksham Suri, Matthew Walmer, Kamal Gupta, Abhinav Shrivastava• 2024

Related benchmarks

Task	Dataset	Result
Semantic segmentation	ADE20K (val)	mIoU38.95	3069
Semantic segmentation	PASCAL VOC (val)	mIoU80.97	380
Semantic segmentation	COCO Stuff (val)	mIoU57.42	167
Semantic Correspondence	SPair-71k (test)	PCK@0.131.38	146
Video Object Segmentation	DAVIS	J & F Mean69.78	128
Semantic segmentation	Pascal VOC 21 classes (val)	mIoU0.7806	103
Semantic segmentation	COCO Stuff-27 (val)	mIoU58.18	92
Unsupervised Object Discovery	COCO 20k	CorLoc60.5	86
Semantic segmentation	Cityscapes	Mean IoU60.08	68
Unsupervised Object Discovery	PASCAL VOC 2012	CorLoc71.71	42

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord