Hierarchical clustering that takes advantage of both density-peak and density-connectivity
About
This paper focuses on density-based clustering, particularly the Density Peak (DP) algorithm and the one based on density-connectivity DBSCAN; and proposes a new method which takes advantage of the individual strengths of these two methods to yield a density-based hierarchical clustering algorithm. Our investigation begins with formally defining the types of clusters DP and DBSCAN are designed to detect; and then identifies the kinds of distributions that DP and DBSCAN individually fail to detect all clusters in a dataset. These identified weaknesses inspire us to formally define a new kind of clusters and propose a new method called DC-HDP to overcome these weaknesses to identify clusters with arbitrary shapes and varied densities. In addition, the new method produces a richer clustering result in terms of hierarchy or dendrogram for better cluster structures understanding. Our empirical evaluation results show that DC-HDP produces the best clustering results on 14 datasets in comparison with 7 state-of-the-art clustering algorithms.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Hierarchical Agglomerative Clustering | Wine | Dendrogram Purity0.83 | 26 | |
| Hierarchical Clustering | banknote | Dendrogram Purity98 | 6 | |
| Hierarchical Clustering | ALLAML | Dendrogram Purity73 | 6 | |
| Hierarchical Clustering | ImageNet-10 | Dendrogram Purity84 | 6 | |
| Hierarchical Clustering | STL-10 | Dendrogram Purity0.59 | 6 | |
| Hierarchical Clustering | LSVT | Dendrogram Purity65 | 6 | |
| Hierarchical Clustering | musk | Dendrogram Purity54 | 6 | |
| Hierarchical Clustering | SEEDS | Dendrogram Purity82 | 6 | |
| Hierarchical Clustering | WDBC | Dendrogram Purity84 | 6 | |
| Hierarchical Clustering | LandCover | Dendrogram Purity43 | 6 |