A Universal Nearest-Neighbor Estimator for Intrinsic Dimensionality
About
Estimating the intrinsic dimensionality (ID) of data is a fundamental problem in machine learning and computer vision, providing insight into the true degrees of freedom underlying high-dimensional observations. Existing methods often rely on geometric or distributional assumptions and can significantly fail when these assumptions are violated. In this paper, we introduce a novel ID estimator based on nearest-neighbor distance ratios that involves simple calculations and achieves state-of-the-art results. Most importantly, we provide a theoretical analysis proving that our estimator is \emph{universal}, namely, it converges to the true ID independently of the distribution generating the data. We present experimental results on benchmark manifolds and real-world datasets to demonstrate the performance of our estimator.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Intrinsic Dimensionality Estimation | Benchmark Manifolds | MPE5.52 | 76 | |
| Intrinsic Dimensionality Estimation | 6D sphere (S6) embedded in R11 with Gaussian noise synthetic (test) | Average Estimated Dimension6.1 | 42 | |
| Intrinsic Dimension Estimation | S10 manifold embedded in R11 sigma = 0.01 | Average Estimated Dimension10.02 | 14 | |
| Intrinsic Dimension Estimation | S10 manifold embedded in R11 sigma = 0.0 | Average Estimated Dimension10.05 | 14 | |
| Intrinsic Dimension Estimation | S10 manifold embedded in R11 sigma = 0.1 | Average Estimated Dimension10.47 | 14 |