Kempe Swap K-Means: A Scalable Near-Optimal Solution for Semi-Supervised Clustering
About
This paper presents a novel centroid-based heuristic algorithm, termed Kempe Swap K-Means, for constrained clustering under rigid must-link (ML) and cannot-link (CL) constraints. The algorithm employs a dual-phase iterative process: an assignment step that utilizes Kempe chain swaps to refine current clustering in the constrained solution space and a centroid update step that computes optimal cluster centroids. To enhance global search capabilities and avoid local optima, the framework incorporates controlled perturbations during the update phase. Empirical evaluations demonstrate that the proposed method achieves near-optimal partitions while maintaining high computational efficiency and scalability. The results indicate that Kempe Swap K-Means consistently outperforms state-of-the-art benchmarks in both clustering accuracy and algorithmic efficiency for large-scale datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Constrained Clustering | COL2 | CPU Time (multiples of KSKM)1 | 130 | |
| Constrained Clustering | COL2 | Min Inertia (Relative to KSKM)1 | 70 | |
| Clustering | COL2 | Clustering Inertia (KSKM multiple)0.98 | 60 | |
| Constrained Clustering | COL2 | Success Rate100 | 60 | |
| Constrained Clustering | COL1 | CPU Time (multiples of KSKM)1 | 52 | |
| Constrained Clustering | COL1 | Minimum Clustering Inertia (Relative to KSKM)1 | 28 | |
| Clustering | CIFAR-100 | -- | 25 | |
| Clustering | COL1 | Clustering Inertia (KSKM Multiple)1 | 24 | |
| Clustering | COL3 | Clustering Inertia (Multiple of KSKM)0.98 | 24 | |
| Constrained Clustering | COL1 | Success Rate100 | 24 |