Composite Silhouette: A Subsampling-based Aggregation Strategy
About
Determining the number of clusters is a central challenge in unsupervised learning, where ground-truth labels are unavailable. The Silhouette coefficient is a widely used internal validation metric for this task, yet its standard micro-averaged form tends to favor larger clusters under size imbalance. Macro-averaging mitigates this bias by weighting clusters equally, but may overemphasize noise from under-represented groups. We introduce Composite Silhouette, an internal criterion for cluster-count selection that aggregates evidence across repeated subsampled clusterings rather than relying on a single partition. For each subsample, micro- and macro-averaged Silhouette scores are combined through an adaptive convex weight determined by their normalized discrepancy and smoothed by a bounded nonlinearity; the final score is then obtained by averaging these subsample-level composites. We establish key properties of the criterion and derive finite-sample concentration guarantees for its subsampling estimate. Experiments on synthetic and real-world datasets show that Composite Silhouette effectively reconciles the strengths of micro- and macro-averaging, yielding more accurate recovery of the ground-truth number of clusters.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Cluster count selection | S1 | Selected Cluster Count5 | 21 | |
| Cluster count selection | S3 | Selected Cluster Count5 | 21 | |
| Cluster count selection | Wne | Selected Cluster Count3 | 16 | |
| Cluster count selection | Bld | Selected Cluster Count6 | 16 | |
| Cluster count selection | Dgt | Selected Cluster Count10 | 16 | |
| Cluster count selection | Nsg | Selected Cluster Count24 | 16 | |
| Cluster count selection | B77 | Selected Cluster Count82 | 16 | |
| Cluster count selection | STL | Selected Cluster Count10 | 16 | |
| Cluster count selection | MDS | Selected Cluster Count15 | 16 | |
| Cluster count selection | S4 | Selected Cluster Count13 | 16 |