Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning Similarity Conditions Without Explicit Supervision

About

Many real-world tasks require models to compare images along multiple similarity conditions (e.g. similarity in color, category or shape). Existing methods often reason about these complex similarity relationships by learning condition-aware embeddings. While such embeddings aid models in learning different notions of similarity, they also limit their capability to generalize to unseen categories since they require explicit labels at test time. To address this deficiency, we propose an approach that jointly learns representations for the different similarity conditions and their contributions as a latent variable without explicit supervision. Comprehensive experiments across three datasets, Polyvore-Outfits, Maryland-Polyvore and UT-Zappos50k, demonstrate the effectiveness of our approach: our model outperforms the state-of-the-art methods, even those that are strongly supervised with pre-defined similarity conditions, on fill-in-the-blank, outfit compatibility prediction and triplet prediction tasks. Finally, we show that our model learns different visually-relevant semantic sub-spaces that allow it to generalize well to unseen categories.

Reuben Tan, Mariya I. Vasileva, Kate Saenko, Bryan A. Plummer• 2019

Related benchmarks

TaskDatasetResultRank
Fill-In-The-BlankPolyvore Disjoint (test)
FITB Accuracy53.67
20
Fill-In-The-BlankPolyvore Standard (test)
Accuracy59.1
12
Compatibility predictionPolyvore Standard (test)
Compatibility AUC0.88
12
Compatibility predictionPolyvore Disjoint (test)
Comp. AUC0.82
12
Showing 4 of 4 rows

Other info

Follow for update