Hyperbolic Vision Transformers: Combining Improvements in Metric Learning

About

Metric learning aims to learn a highly discriminative model encouraging the embeddings of similar classes to be close in the chosen metrics and pushed apart for dissimilar ones. The common recipe is to use an encoder to extract embeddings and a distance-based loss function to match the representations -- usually, the Euclidean distance is utilized. An emerging interest in learning hyperbolic data embeddings suggests that hyperbolic geometry can be beneficial for natural data. Following this line of work, we propose a new hyperbolic-based model for metric learning. At the core of our method is a vision transformer with output embeddings mapped to hyperbolic space. These embeddings are directly optimized using modified pairwise cross-entropy loss. We evaluate the proposed model with six different formulations on four datasets achieving the new state-of-the-art performance. The source code is available at https://github.com/htdt/hyp_metric.

Aleksandr Ermolov, Leyla Mirvakhabova, Valentin Khrulkov, Nicu Sebe, Ivan Oseledets• 2022

Related benchmarks

Task	Dataset	Result
Image Retrieval	CUB-200-2011 (test)	Recall@185.6	251
Image Retrieval	Stanford Online Products (test)	Recall@185.9	231
Image Retrieval	CUB-200 2011	Recall@185.6	163
Image Retrieval	In-shop Clothes Retrieval Dataset	Recall@192.7	120
Image Retrieval	CARS 196	Recall@189.2	98
Image Retrieval	CUB	Recall@185.6	87
Image Retrieval	Stanford Online Products	Recall@185.9	64
Deep Metric Learning	CARS196	Recall@186.5	50
Image Retrieval	Cars	R@189.2	44
Image Retrieval	SOP	Recall@185.9	32

Showing 10 of 21 rows

Other info

Code

Follow for update

@wizwand_team Discord