Efficient Centroid-Linkage Clustering
About
We give an efficient algorithm for Centroid-Linkage Hierarchical Agglomerative Clustering (HAC), which computes a $c$-approximate clustering in roughly $n^{1+O(1/c^2)}$ time. We obtain our result by combining a new Centroid-Linkage HAC algorithm with a novel fully dynamic data structure for nearest neighbor search which works under adaptive updates. We also evaluate our algorithm empirically. By leveraging a state-of-the-art nearest-neighbor search library, we obtain a fast and accurate Centroid-Linkage HAC algorithm. Compared to an existing state-of-the-art exact baseline, our implementation maintains the clustering quality while delivering up to a $36\times$ speedup due to performing fewer distance comparisons.
MohammadHossein Bateni, Laxman Dhulipala, Willem Fletcher, Kishen N Gowda, D Ellis Hershkowitz, Rajesh Jayaram, Jakub {\L}\k{a}cki• 2024
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Hierarchical Clustering | MNIST | Running Time (s)82.18 | 19 | |
| Hierarchical Clustering | Birds | Runtime (s)79.45 | 19 | |
| Hierarchical Agglomerative Clustering | Covertype | ARI0.547 | 2 | |
| Clustering | AMI0.426 | 2 | ||
| Clustering | Covertype | AMI0.163 | 2 | |
| Hierarchical Agglomerative Clustering | ARI6.4 | 2 |
Showing 6 of 6 rows