Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Chamfer-Linkage for Hierarchical Agglomerative Clustering

About

Hierarchical Agglomerative Clustering (HAC) is a widely-used clustering method based on repeatedly merging the closest pair of clusters, where inter-cluster distances are determined by a linkage function. Unlike many clustering methods, HAC does not optimize a single explicit global objective; clustering quality is therefore primarily evaluated empirically, and the choice of linkage function plays a crucial role in practice. However, popular classical linkages, such as single-linkage, average-linkage and Ward's method show high variability across real-world datasets and do not consistently produce high-quality clusterings in practice. In this paper, we propose \emph{Chamfer-linkage}, a novel linkage function that measures the distance between clusters using the Chamfer distance, a popular notion of distance between point-clouds in machine learning and computer vision. We argue that Chamfer-linkage satisfies desirable concept representation properties that other popular measures struggle to satisfy. Theoretically, we show that Chamfer-linkage HAC can be implemented in $O(n^2)$ time, matching the efficiency of classical linkage functions. Experimentally, we find that Chamfer-linkage consistently yields higher-quality clusterings than classical linkages such as average-linkage and Ward's method across a diverse collection of datasets. Our results establish Chamfer-linkage as a practical drop-in replacement for classical linkage functions, broadening the toolkit for hierarchical clustering in both theory and practice.

Kishen N Gowda, Willem Fletcher, MohammadHossein Bateni, Laxman Dhulipala, D Ellis Hershkowitz, Rajesh Jayaram, Jakub {\L}\k{a}cki• 2026

Related benchmarks

TaskDatasetResultRank
Image ClusteringCIFAR-10
NMI0.767
318
ClusteringFashion MNIST
NMI71.6
107
ClusteringWine
ARI0.402
43
ClusteringIris
ARI0.775
29
Hierarchical Agglomerative ClusteringWine--
26
ClusteringCIFAR-100
NMI64.9
25
ClusteringMNIST
NMI85.3
24
ClusteringDigits
ARI0.875
23
Hierarchical Agglomerative ClusteringIris--
20
Hierarchical Agglomerative ClusteringDigits--
20
Showing 10 of 39 rows

Other info

Follow for update