Federated Hierarchical Clustering with Automatic Selection of Optimal Cluster Numbers
About
Federated Clustering (FC) is an emerging and promising solution in exploring data distribution patterns from distributed and privacy-protected data in an unsupervised manner. Existing FC methods implicitly rely on the assumption that clients are with a known number of uniformly sized clusters. However, the true number of clusters is typically unknown, and cluster sizes are naturally imbalanced in real scenarios. Furthermore, the privacy-preserving transmission constraints in federated learning inevitably reduce usable information, making the development of robust and accurate FC extremely challenging. Accordingly, we propose a novel FC framework named Fed-$k^*$-HC, which can automatically determine an optimal number of clusters $k^*$ based on the data distribution explored through hierarchical clustering. To obtain the global data distribution for $k^*$ determination, we let each client generate micro-subclusters. Their prototypes are then uploaded to the server for hierarchical merging. The density-based merging design allows exploring clusters of varying sizes and shapes, and the progressive merging process can self-terminate according to the neighboring relationships among the prototypes to determine $k^*$. Extensive experiments on diverse datasets demonstrate the FC capability of the proposed Fed-$k^*$-HC in accurately exploring a proper number of clusters.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Clustering | Breast | ARI0.8384 | 24 | |
| Clustering | pageblock | F-measure90.4 | 9 | |
| Clustering | Yeast | F-measure57.91 | 9 | |
| Clustering | Abalone | F-measure68.55 | 9 | |
| Clustering | Digits | F-measure70.78 | 9 | |
| Clustering | Gaussian Synthetic | F-measure98.63 | 9 | |
| Clustering | ids2 synthetic | F-measure99.35 | 9 | |
| Federated Clustering | ids2 non-iid synthetic (test) | F-measure99.84 | 9 | |
| Federated Clustering | gaussian_non_iid synthetic (test) | F-measure99.6 | 9 |