Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Differentially-Private Hierarchical Clustering with Provable Approximation Guarantees

About

Hierarchical Clustering is a popular unsupervised machine learning method with decades of history and numerous applications. We initiate the study of differentially private approximation algorithms for hierarchical clustering under the rigorous framework introduced by (Dasgupta, 2016). We show strong lower bounds for the problem: that any $\epsilon$-DP algorithm must exhibit $O(|V|^2/ \epsilon)$-additive error for an input dataset $V$. Then, we exhibit a polynomial-time approximation algorithm with $O(|V|^{2.5}/ \epsilon)$-additive error, and an exponential-time algorithm that meets the lower bound. To overcome the lower bound, we focus on the stochastic block model, a popular model of graphs, and, with a separation assumption on the blocks, propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly. Finally, we perform an empirical study of our algorithms and validate their performance.

Jacob Imola, Alessandro Epasto, Mohammad Mahdian, Vincent Cohen-Addad, Vahab Mirrokni• 2023

Related benchmarks

TaskDatasetResultRank
Hierarchical Agglomerative ClusteringWine
Dendrogram Purity0.895
26
Hierarchical Agglomerative ClusteringDigits
Dendrogram Purity0.81
20
Hierarchical Agglomerative ClusteringIris
Dendrogram Purity0.829
20
Hierarchical ClusteringSpambase
Dasgupta's Cost3.43e+7
10
Hierarchical ClusteringOpticalDigits
Dasgupta Cost2.87e+5
10
Hierarchical ClusteringSpambase
DP61
10
Hierarchical Clusteringzoo
DP0.936
10
Hierarchical ClusteringBr. Cancer
DP Score92.9
10
Showing 8 of 8 rows

Other info

Follow for update