Variational Fair Clustering
About
We propose a general variational framework of fair clustering, which integrates an original Kullback-Leibler (KL) fairness term with a large class of clustering objectives, including prototype or graph based. Fundamentally different from the existing combinatorial and spectral solutions, our variational multi-term approach enables to control the trade-off levels between the fairness and clustering objectives. We derive a general tight upper bound based on a concave-convex decomposition of our fairness term, its Lipschitz-gradient property and the Pinsker's inequality. Our tight upper bound can be jointly optimized with various clustering objectives, while yielding a scalable solution, with convergence guarantee. Interestingly, at each iteration, it performs an independent update for each assignment variable. Therefore, it can be easily distributed for large-scale datasets. This scalability is important as it enables to explore different trade-off levels between the fairness and clustering objectives. Unlike spectral relaxation, our formulation does not require computing its eigenvalue decomposition. We report comprehensive evaluations and comparisons with state-of-the-art methods over various fair-clustering benchmarks, which show that our variational formulation can yield highly competitive solutions in terms of fairness and clustering objectives.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Clustering | Reverse MNIST | Balance100 | 12 | |
| Clustering | MTFL | Balance98.2 | 12 | |
| Fair Graph Clustering | SBM uni. xi in [0, 0.2], n = 5000 | Clustering Error (CE)0.75 | 12 | |
| Fair Graph Clustering | SBM uni. xi in [0.4, 0.6], n = 5000 | Clustering Error (CE)0.77 | 12 | |
| Fair Graph Clustering | SBM uni. xi in [0, 0.2], n = 1000 | Clustering Error (CE)0.88 | 12 | |
| Fair Graph Clustering | SBM uni. xi in [0.4, 0.6], n = 1000 | Cross-Entropy0.89 | 12 |