Clustering with minimum spanning trees: How good can it be?
About
Minimum spanning trees (MSTs) provide a convenient representation of datasets in numerous pattern recognition activities. Moreover, they are relatively fast to compute. In this paper, we quantify the extent to which they are meaningful in low-dimensional partitional data clustering tasks. By identifying the upper bounds for the agreement between the best (oracle) algorithm and the expert labels from a large battery of benchmark data, we discover that MST methods can be very competitive. Next, we review, study, extend, and generalise a few existing, state-of-the-art MST-based partitioning schemes. This leads to some new noteworthy approaches. Overall, the Genie and the information-theoretic methods often outperform the non-MST algorithms such as K-means, Gaussian mixtures, spectral clustering, Birch, density-based, and classical hierarchical agglomerative procedures. Nevertheless, we identify that there is still some room for improvement, and thus the development of novel algorithms is encouraged.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Clustering | Benchmark dataset repository All labels v1.1.0 | Cases with AR < 0.8 Count16 | 14 | |
| Clustering | Benchmark dataset repository Third-party labels only v1.1.0 | Cases with AR < 0.87 | 14 |