Parametric UMAP embeddings for representation and semi-supervised learning
About
UMAP is a non-parametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) Compute a graphical representation of a dataset (fuzzy simplicial complex), and (2) Through stochastic gradient descent, optimize a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that Parametric UMAP performs comparably to its non-parametric counterpart while conferring the benefit of a learned parametric mapping (e.g. fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semi-supervised learning by capturing structure in unlabeled data. Google Colab walkthrough: https://colab.research.google.com/drive/1WkXVZ5pnMrm17m0YgmtoNjM_XHdnE5Vp?usp=sharing
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | MNIST | Accuracy94.1 | 395 | |
| Clustering | MNIST | NMI0.7824 | 92 | |
| Classification | COIL-20 | Accuracy0.774 | 76 | |
| Dimensionality Reduction | Cassin's | AUC RNX37.34 | 63 | |
| Classification | MNIST | Accuracy94.2 | 55 | |
| Dimensionality Reduction | CIFAR10 | Trustworthiness Score0.914 | 45 | |
| Dimensionality Reduction | Retina | AUC R_NX Score0.3273 | 42 | |
| Dimensionality Reduction | FMNIST | AUC R_NX Score36.62 | 42 | |
| Dimensionality Reduction | MNIST | AUC R_NX Score31.94 | 42 | |
| Classification | Activity | Accuracy90 | 34 |