Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Clustering with minimum spanning trees: How good can it be?

About

Minimum spanning trees (MSTs) provide a convenient representation of datasets in numerous pattern recognition activities. Moreover, they are relatively fast to compute. In this paper, we quantify the extent to which they are meaningful in low-dimensional partitional data clustering tasks. By identifying the upper bounds for the agreement between the best (oracle) algorithm and the expert labels from a large battery of benchmark data, we discover that MST methods can be very competitive. Next, we review, study, extend, and generalise a few existing, state-of-the-art MST-based partitioning schemes. This leads to some new noteworthy approaches. Overall, the Genie and the information-theoretic methods often outperform the non-MST algorithms such as K-means, Gaussian mixtures, spectral clustering, Birch, density-based, and classical hierarchical agglomerative procedures. Nevertheless, we identify that there is still some room for improvement, and thus the development of novel algorithms is encouraged.

Marek Gagolewski, Anna Cena, Maciej Bartoszuk, {\L}ukasz Brzozowski• 2023

Related benchmarks

TaskDatasetResultRank
ClusteringBenchmark dataset repository All labels v1.1.0
Cases with AR < 0.8 Count16
14
ClusteringBenchmark dataset repository Third-party labels only v1.1.0
Cases with AR < 0.87
14
Showing 2 of 2 rows

Other info

Follow for update