Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction
About
Dimensionality reduction is crucial both for visualization and preprocessing high dimensional data for machine learning. We introduce a novel method based on a hierarchy built on 1-nearest neighbor graphs in the original space which is used to preserve the grouping properties of the data distribution on multiple levels. The core of the proposal is an optimization-free projection that is competitive with the latest versions of t-SNE and UMAP in performance and visualization quality while being an order of magnitude faster in run-time. Furthermore, its interpretable mechanics, the ability to project new data, and the natural separation of data clusters in visualizations make it a general purpose unsupervised dimension reduction technique. In the paper, we argue about the soundness of the proposed method and evaluate it on a diverse collection of datasets with sizes varying from 1K to 11M samples and dimensions from 28 to 16K. We perform comparisons with other state-of-the-art methods on multiple metrics and target dimensions highlighting its efficiency and performance. Code is available at https://github.com/koulakis/h-nne
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Dimensionality Reduction | CIFAR10 | Trustworthiness Score0.907 | 45 | |
| Classification | warpPIE 10P | Accuracy72.5 | 26 | |
| Dimensionality Reduction | F-MNIST | Triplet Centroid Accuracy92.5 | 10 | |
| Dimensionality Reduction | MNIST | Triplet Centroid Accuracy67.1 | 10 | |
| Classification | SAM561 | Accuracy83.8 | 8 | |
| Classification | HCL500 | Accuracy62.2 | 8 | |
| Classification | MC1374 | Accuracy62.3 | 8 | |
| Classification | PROT579 | Accuracy82.4 | 8 | |
| Classification | GA1457 | Accuracy0.774 | 8 | |
| Dimensionality Reduction | COIL20 | Triplet Centroid Accuracy79.9 | 5 |