Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Deep Clustering of Tabular Data by Weighted Gaussian Distribution Learning

About

Deep learning methods are primarily proposed for supervised learning of images or text with limited applications to clustering problems. In contrast, tabular data with heterogeneous features pose unique challenges in representation learning, where deep learning has yet to replace traditional machine learning. This paper addresses these challenges in developing one of the first deep clustering methods for tabular data: Gaussian Cluster Embedding in Autoencoder Latent Space (G-CEALS). G-CEALS is an unsupervised deep clustering framework for learning the parameters of multivariate Gaussian cluster distributions by iteratively updating individual cluster weights. The G-CEALS method presents average rank orderings of 2.9(1.7) and 2.8(1.7) based on clustering accuracy and adjusted Rand index (ARI) scores on sixteen tabular data sets, respectively, and outperforms nine state-of-the-art clustering methods. G-CEALS substantially improves clustering performance compared to traditional K-means and GMM, which are still de facto methods for clustering tabular data. Similar computationally efficient and high-performing deep clustering frameworks are imperative to reap the myriad benefits of deep learning on tabular data over traditional machine learning.

Shourav B. Rabbani, Ivan V. Medri, Manar D. Samad• 2023

Related benchmarks

TaskDatasetResultRank
Image ClusteringSTL-10--
282
ClusteringStatlog
ARI41.1
30
ClusteringYeast
ARI10.1
29
ClusteringTUANDROMD
ARI0.4
20
Clusteringpima
ARI0.06
20
ClusteringMice Protein
ARI0.253
20
Clusteringphoneme
ARI5
20
ClusteringShuttle
ARI0.316
20
ClusteringRice
ARI0.084
20
ClusteringPenBased
ARI0.557
20
Showing 10 of 24 rows

Other info

Follow for update