Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FastUMAP: Scalable Dimensionality Reduction via Bipartite Landmark Sampling

About

Exploratory analysis of high-dimensional data rarely stops at a single embedding. In practice, analysts rerun dimensionality reduction after changing preprocessing, subsets, or hyperparameters, and standard nonlinear methods can quickly become the bottleneck. We introduce FastUMAP (Bipartite Manifold Approximation and Projection), a landmark-based method designed for this repeated-use setting. FastUMAP builds a sparse point-landmark fuzzy graph, computes a Nystrom spectral warm start from the induced landmark affinity, and then refines all sample coordinates with a UMAP-style objective on the bipartite graph. The landmark ratio r = m/n provides a direct way to trade runtime against fidelity. On 9 benchmark datasets spanning 178 to 70,000 samples, FastUMAP has the lowest runtime on 7 datasets in our reported default-implementation comparison on one workstation. On MNIST and Fashion-MNIST (n=70000), it runs in about 4.6 seconds, compared with about 73--75 seconds for Barnes--Hut t-SNE, while reaching 91.4% mean kNN accuracy versus 94.6% for the strongest accuracy baseline. FastUMAP is therefore best viewed as a fast option for repeated exploratory embedding, rather than as a replacement for accuracy-first methods.

Hongmin Li• 2026

Related benchmarks

TaskDatasetResultRank
ClassificationBreast cancer
Accuracy96.4
61
KNN ClassificationShuttle
Accuracy99.5
35
KNN ClassificationFashion MNIST
Accuracy74.7
35
KNN ClassificationMNIST
Accuracy90.3
35
ClassificationWine
Accuracy95.5
13
Dimensionality ReductionMNIST
Runtime4.65
5
Dimensionality ReductionF-MNIST
Run Time4.59
5
Dimensionality ReductionDermatology
Runtime (s)0.019
5
Dimensionality Reductionmfeat
Runtime (s)0.128
5
Dimensionality ReductionSpambase
Runtime (s)0.246
5
Showing 10 of 18 rows

Other info

Follow for update