Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Composite Silhouette: A Subsampling-based Aggregation Strategy

About

Determining the number of clusters is a central challenge in unsupervised learning, where ground-truth labels are unavailable. The Silhouette coefficient is a widely used internal validation metric for this task, yet its standard micro-averaged form tends to favor larger clusters under size imbalance. Macro-averaging mitigates this bias by weighting clusters equally, but may overemphasize noise from under-represented groups. We introduce Composite Silhouette, an internal criterion for cluster-count selection that aggregates evidence across repeated subsampled clusterings rather than relying on a single partition. For each subsample, micro- and macro-averaged Silhouette scores are combined through an adaptive convex weight determined by their normalized discrepancy and smoothed by a bounded nonlinearity; the final score is then obtained by averaging these subsample-level composites. We establish key properties of the criterion and derive finite-sample concentration guarantees for its subsampling estimate. Experiments on synthetic and real-world datasets show that Composite Silhouette effectively reconciles the strengths of micro- and macro-averaging, yielding more accurate recovery of the ground-truth number of clusters.

Aggelos Semoglou, Aristidis Likas, John Pavlopoulos• 2026

Related benchmarks

TaskDatasetResultRank
Cluster count selectionS1
Selected Cluster Count5
21
Cluster count selectionS3
Selected Cluster Count5
21
Cluster count selectionWne
Selected Cluster Count3
16
Cluster count selectionBld
Selected Cluster Count6
16
Cluster count selectionDgt
Selected Cluster Count10
16
Cluster count selectionNsg
Selected Cluster Count24
16
Cluster count selectionB77
Selected Cluster Count82
16
Cluster count selectionSTL
Selected Cluster Count10
16
Cluster count selectionMDS
Selected Cluster Count15
16
Cluster count selectionS4
Selected Cluster Count13
16
Showing 10 of 33 rows

Other info

Follow for update