Weight-Informed Self-Explaining Clustering for Mixed-Type Tabular Data

About

Clustering mixed-type tabular data is fundamental for exploratory analysis, yet remains challenging due to misaligned numerical-categorical representations, uneven and context-dependent feature relevance, and disconnected and post-hoc explanation from the clustering process. We propose WISE, a Weight-Informed Self-Explaining framework that unifies representation, feature weighting, clustering, and interpretation in a fully unsupervised and transparent pipeline. WISE introduces Binary Encoding with Padding (BEP) to align heterogeneous features in a unified sparse space, a Leave-One-Feature-Out (LOFO) strategy to sense multiple high-quality and diverse feature-weighting views, and a two-stage weight-aware clustering procedure to aggregate alternative semantic partitions. To ensure intrinsic interpretability, we further develop Discriminative FreqItems (DFI), which yields feature-level explanations that are consistent from instances to clusters with an additive decomposition guarantee. Extensive experiments on six real-world datasets demonstrate that WISE consistently outperforms classical and neural baselines in clustering quality while remaining efficient, and produces faithful, human-interpretable explanations grounded in the same primitives that drive clustering.

Lehao Li, Qiang Huang, Yihao Ang, Bryan Kian Hsiang Low, Anthony K. H. Tung, Xiaokui Xiao• 2026

Related benchmarks

Task	Dataset	Result
Mixed-type tabular clustering	Adult	ARI0.663	6
Mixed-type tabular clustering	Vermont	ARI0.283	6
Mixed-type tabular clustering	Arizona	ARI0.309	6
Mixed-type tabular clustering	Obesity	ARI0.222	6
Mixed-type tabular clustering	Credit	ARI0.328	6
Mixed-type tabular clustering	GeoNames	ARI0.146	6
Clustering	Adult, Vermont, Arizona, Obesity, Credit, and GeoNames Average Rank (test)	ARI1	6

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord