Chamfer-Linkage for Hierarchical Agglomerative Clustering
About
Hierarchical Agglomerative Clustering (HAC) is a widely-used clustering method based on repeatedly merging the closest pair of clusters, where inter-cluster distances are determined by a linkage function. Unlike many clustering methods, HAC does not optimize a single explicit global objective; clustering quality is therefore primarily evaluated empirically, and the choice of linkage function plays a crucial role in practice. However, popular classical linkages, such as single-linkage, average-linkage and Ward's method show high variability across real-world datasets and do not consistently produce high-quality clusterings in practice. In this paper, we propose \emph{Chamfer-linkage}, a novel linkage function that measures the distance between clusters using the Chamfer distance, a popular notion of distance between point-clouds in machine learning and computer vision. We argue that Chamfer-linkage satisfies desirable concept representation properties that other popular measures struggle to satisfy. Theoretically, we show that Chamfer-linkage HAC can be implemented in $O(n^2)$ time, matching the efficiency of classical linkage functions. Experimentally, we find that Chamfer-linkage consistently yields higher-quality clusterings than classical linkages such as average-linkage and Ward's method across a diverse collection of datasets. Our results establish Chamfer-linkage as a practical drop-in replacement for classical linkage functions, broadening the toolkit for hierarchical clustering in both theory and practice.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Clustering | CIFAR-10 | NMI0.767 | 243 | |
| Clustering | Fashion MNIST | NMI71.6 | 95 | |
| Clustering | Wine | ARI0.402 | 34 | |
| Clustering | Iris | ARI0.775 | 29 | |
| Hierarchical Agglomerative Clustering | Wine | -- | 26 | |
| Clustering | MNIST | NMI85.3 | 24 | |
| Hierarchical Agglomerative Clustering | Iris | -- | 20 | |
| Hierarchical Agglomerative Clustering | Digits | -- | 20 | |
| Clustering | CIFAR-100 | ARI31.8 | 19 | |
| Hierarchical Clustering | MNIST | Running Time (s)400.8 | 19 |