Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

The Wasserstein transform

About

We introduce the Wasserstein Transform (WT), a general unsupervised framework for updating distance structures on given data sets with the purpose of enhancing features and denoising. Our framework represents each data point by a probability measure reflecting the neighborhood structure of the point, and then updates the distance by computing the Wasserstein distance between these probability measures. The Wasserstein Transform is a general method which extends the mean shift family of algorithms. We study several instances of WT, and in particular, in one of the instances which we call the Gaussian Transform (GT), we utilize Gaussian measures to model neighborhood structures of individual data points. GT is computationally cheaper than other instances of WT since there exists closed form solution for the $\ell^2$-Wasserstein distance between Gaussian measures. We study the relationship between different instances of WT and prove that each of the instances is stable under perturbations. We devise iterative algorithms for performing the above-mentioned WT and propose several strategies to accelerate GT, such as an observation from linear algebra for reducing the number of matrix square root computations. We examine the performance of the Wasserstein Transform method in many tasks, such as denoising, clustering, image segmentation and word embeddings.

Kun Jin, Facundo M\'emoli, Zane Smith, Zhengchao Wan• 2018

Related benchmarks

TaskDatasetResultRank
Word SimilarityRG-65
Spearman Correlation0.62
41
Word SimilaritySimLex-999
Spearman Correlation27
31
Word SimilarityMechanical Turk-771
Spearman ρ0.56
8
Word SimilarityMC-30
Spearman Correlation0.67
6
Word SimilarityMTurk-287
Spearman Correlation0.62
6
Word SimilarityMEN 3k (train)
Spearman Correlation0.65
6
Word SimilarityRW-STANFORD
Spearman Correlation0.38
6
Word SimilarityWS-353 SIM
Spearman Correlation0.6
6
Word SimilarityWS-YP-130
Spearman Correlation0.37
6
Word SimilarityWS-353 ALL
Spearman Correlation0.51
6
Showing 10 of 13 rows

Other info

Follow for update