Let Go of Your Labels with Unsupervised Transfer

About

Foundation vision-language models have enabled remarkable zero-shot transferability of the pre-trained representations to a wide range of downstream tasks. However, to solve a new task, zero-shot transfer still necessitates human guidance to define visual categories that appear in the data. Here, we show that fully unsupervised transfer emerges when searching for the labeling of a dataset that induces maximal margin classifiers in representation spaces of different foundation models. We present TURTLE, a fully unsupervised method that effectively employs this guiding principle to uncover the underlying labeling of a downstream dataset without any supervision and task-specific representation learning. We evaluate TURTLE on a diverse benchmark suite of 26 datasets and show that it achieves new state-of-the-art unsupervised performance. Furthermore, TURTLE, although being fully unsupervised, outperforms zero-shot transfer baselines on a wide range of datasets. In particular, TURTLE matches the average performance of CLIP zero-shot on 26 datasets by employing the same representation space, spanning a wide range of architectures and model sizes. By guiding the search for the underlying labeling using the representation spaces of two foundation models, TURTLE surpasses zero-shot transfer and unsupervised prompt tuning baselines, demonstrating the surprising power and effectiveness of unsupervised transfer.

Artyom Gadetsky, Yulun Jiang, Maria Brbic• 2024

Related benchmarks

Task	Dataset	Result
Image Classification	Food-101	Accuracy92.2	570
Image Clustering	CIFAR-10	NMI0.929	318
Image Clustering	STL-10	ACC98.4	282
Image Classification	ImageNet	Accuracy72.9	184
Clustering	MNIST (test)	--	136
Clustering	CIFAR-100 (test)	ACC89.1	123
Image Clustering	CIFAR-100	ACC46.4	111
Clustering	Fashion MNIST	NMI72.3	107
Image Clustering	DTD	NMI63.3	49
Image Clustering	ImageNet-1K	NMI88.12	36

Showing 10 of 27 rows

Other info

Code

Follow for update

@wizwand_team Discord