Nearly-Optimal Hierarchical Clustering for Well-Clustered Graphs
About
This paper presents two efficient hierarchical clustering (HC) algorithms with respect to Dasgupta's cost function. For any input graph $G$ with a clear cluster-structure, our designed algorithms run in nearly-linear time in the input size of $G$, and return an $O(1)$-approximate HC tree with respect to Dasgupta's cost function. We compare the performance of our algorithm against the previous state-of-the-art on synthetic and real-world datasets and show that our designed algorithm produces comparable or better HC trees with much lower running time.
Steinar Laenen, Bogdan-Adrian Manghiuc, He Sun• 2023
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Hierarchical Agglomerative Clustering | Wine | Dendrogram Purity0.95 | 26 | |
| Hierarchical Agglomerative Clustering | Digits | Dendrogram Purity0.859 | 20 | |
| Hierarchical Agglomerative Clustering | Iris | Dendrogram Purity0.832 | 20 | |
| Hierarchical Clustering | OpticalDigits | Dasgupta Cost2.58e+5 | 10 | |
| Hierarchical Clustering | Br. Cancer | DP Score95.1 | 10 | |
| Hierarchical Clustering | zoo | DP0.954 | 10 | |
| Hierarchical Clustering | Spambase | DP55.1 | 10 | |
| Hierarchical Clustering | Spambase | Dasgupta's Cost1.01e+8 | 10 | |
| Hierarchical Clustering | pendigits | DP65.3 | 9 | |
| Hierarchical Clustering | SEEDS | Dendrogram Purity88 | 6 |
Showing 10 of 20 rows