Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Weight-Informed Self-Explaining Clustering for Mixed-Type Tabular Data

About

Clustering mixed-type tabular data is fundamental for exploratory analysis, yet remains challenging due to misaligned numerical-categorical representations, uneven and context-dependent feature relevance, and disconnected and post-hoc explanation from the clustering process. We propose WISE, a Weight-Informed Self-Explaining framework that unifies representation, feature weighting, clustering, and interpretation in a fully unsupervised and transparent pipeline. WISE introduces Binary Encoding with Padding (BEP) to align heterogeneous features in a unified sparse space, a Leave-One-Feature-Out (LOFO) strategy to sense multiple high-quality and diverse feature-weighting views, and a two-stage weight-aware clustering procedure to aggregate alternative semantic partitions. To ensure intrinsic interpretability, we further develop Discriminative FreqItems (DFI), which yields feature-level explanations that are consistent from instances to clusters with an additive decomposition guarantee. Extensive experiments on six real-world datasets demonstrate that WISE consistently outperforms classical and neural baselines in clustering quality while remaining efficient, and produces faithful, human-interpretable explanations grounded in the same primitives that drive clustering.

Lehao Li, Qiang Huang, Yihao Ang, Bryan Kian Hsiang Low, Anthony K. H. Tung, Xiaokui Xiao• 2026

Related benchmarks

TaskDatasetResultRank
Mixed-type tabular clusteringAdult
ARI0.663
6
Mixed-type tabular clusteringVermont
ARI0.283
6
Mixed-type tabular clusteringArizona
ARI0.309
6
Mixed-type tabular clusteringObesity
ARI0.222
6
Mixed-type tabular clusteringCredit
ARI0.328
6
Mixed-type tabular clusteringGeoNames
ARI0.146
6
ClusteringAdult, Vermont, Arizona, Obesity, Credit, and GeoNames Average Rank (test)
ARI1
6
Showing 7 of 7 rows

Other info

Follow for update