Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Estimating the effective dimension of large biological datasets using Fisher separability analysis

About

Modern large-scale datasets are frequently said to be high-dimensional. However, their data point clouds frequently possess structures, significantly decreasing their intrinsic dimensionality (ID) due to the presence of clusters, points being located close to low-dimensional varieties or fine-grained lumping. We test a recently introduced dimensionality estimator, based on analysing the separability properties of data points, on several benchmarks and real biological datasets. We show that the introduced measure of ID has performance competitive with state-of-the-art measures, being efficient across a wide range of dimensions and performing better in the case of noisy samples. Moreover, it allows estimating the intrinsic dimension in situations where the intrinsic manifold assumption is not valid.

Luca Albergante, Jonathan Bac, Andrei Zinovyev• 2019

Related benchmarks

TaskDatasetResultRank
Intrinsic Dimensionality EstimationBenchmark Manifolds
MPE40.92
76
Intrinsic Dimensionality Estimation6D sphere (S6) embedded in R11 with Gaussian noise synthetic (test)
Average Estimated Dimension5.63
42
Intrinsic Dimension EstimationS10 manifold embedded in R11 sigma = 0.0
Average Estimated Dimension11
14
Intrinsic Dimension EstimationS10 manifold embedded in R11 sigma = 0.01
Average Estimated Dimension7.87
14
Intrinsic Dimension EstimationS10 manifold embedded in R11 sigma = 0.1
Average Estimated Dimension5.82
14
Showing 5 of 5 rows

Other info

Follow for update