Parametric UMAP embeddings for representation and semi-supervised learning

About

UMAP is a non-parametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) Compute a graphical representation of a dataset (fuzzy simplicial complex), and (2) Through stochastic gradient descent, optimize a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that Parametric UMAP performs comparably to its non-parametric counterpart while conferring the benefit of a learned parametric mapping (e.g. fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semi-supervised learning by capturing structure in unlabeled data. Google Colab walkthrough: https://colab.research.google.com/drive/1WkXVZ5pnMrm17m0YgmtoNjM_XHdnE5Vp?usp=sharing

Tim Sainburg, Leland McInnes, Timothy Q Gentner• 2020

Related benchmarks

Task	Dataset	Result
Image Classification	MNIST	Accuracy94.1	417
Clustering	MNIST	NMI0.7824	113
Classification	COIL-20	Accuracy0.774	96
Image Classification	EMNIST	Accuracy77.5	90
Classification	MNIST	Accuracy94.2	89
Classification	Colon	Accuracy91.8	78
Dimensionality Reduction	Cassin's	AUC RNX37.34	63
Dimensionality Reduction	CIFAR10	Trustworthiness Score0.914	45
Dimensionality Reduction	Retina	AUC R_NX Score0.3273	42
Dimensionality Reduction	FMNIST	AUC R_NX Score36.62	42

Showing 10 of 56 rows

Other info

Code

Follow for update

@wizwand_team Discord