DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning

About

Visual Similarity plays an important role in many computer vision applications. Deep metric learning (DML) is a powerful framework for learning such similarities which not only generalize from training data to identically distributed test distributions, but in particular also translate to unknown test classes. However, its prevailing learning paradigm is class-discriminative supervised training, which typically results in representations specialized in separating training classes. For effective generalization, however, such an image representation needs to capture a diverse range of data characteristics. To this end, we propose and study multiple complementary learning tasks, targeting conceptually different data relationships by only resorting to the available training samples and labels of a standard DML setting. Through simultaneous optimization of our tasks we learn a single model to aggregate their training signals, resulting in strong generalization and state-of-the-art performance on multiple established DML benchmark datasets.

Timo Milbich, Karsten Roth, Homanga Bharadhwaj, Samarth Sinha, Yoshua Bengio, Bj\"orn Ommer, Joseph Paul Cohen• 2020

Related benchmarks

Task	Dataset	Result
Image Retrieval	CUB-200 2011	Recall@169.2	163
Deep Metric Learning	CUB200 2011 (test)	Recall@169.2	129
Image Retrieval	CARS 196	Recall@192.9	102
Deep Metric Learning	CARS196 (test)	R@187.6	56
Deep Metric Learning	CARS196	Recall@187.6	50
Image Retrieval	SOP	Recall@179.6	32
Deep Metric Learning	Stanford Online Products (SOP)	R@179.6	20
Deep Metric Learning	SOP	Recall@179.6	16
Image Retrieval	Cars (test)	Recall@183.1	13

Showing 9 of 9 rows

Other info

Code

Follow for update

@wizwand_team Discord