Clustering with minimum spanning trees: How good can it be?

About

Minimum spanning trees (MSTs) provide a convenient representation of datasets in numerous pattern recognition activities. Moreover, they are relatively fast to compute. In this paper, we quantify the extent to which they are meaningful in low-dimensional partitional data clustering tasks. By identifying the upper bounds for the agreement between the best (oracle) algorithm and the expert labels from a large battery of benchmark data, we discover that MST methods can be very competitive. Next, we review, study, extend, and generalise a few existing, state-of-the-art MST-based partitioning schemes. This leads to some new noteworthy approaches. Overall, the Genie and the information-theoretic methods often outperform the non-MST algorithms such as K-means, Gaussian mixtures, spectral clustering, Birch, density-based, and classical hierarchical agglomerative procedures. Nevertheless, we identify that there is still some room for improvement, and thus the development of novel algorithms is encouraged.

Marek Gagolewski, Anna Cena, Maciej Bartoszuk, {\L}ukasz Brzozowski• 2023

Related benchmarks

Task	Dataset	Result	Rank
Clustering	Benchmark dataset repository All labels v1.1.0	Cases with AR < 0.8 Count16		14
Clustering	Benchmark dataset repository Third-party labels only v1.1.0	Cases with AR < 0.87		14

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord