Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Generalized Framework for Predictive Clustering and Optimization

About

Clustering is a powerful and extensively used data science tool. While clustering is generally thought of as an unsupervised learning technique, there are also supervised variations such as Spath's clusterwise regression that attempt to find clusters of data that yield low regression error on a supervised target. We believe that clusterwise regression is just a single vertex of a largely unexplored design space of supervised clustering models. In this article, we define a generalized optimization framework for predictive clustering that admits different cluster definitions (arbitrary point assignment, closest center, and bounding box) and both regression and classification objectives. We then present a joint optimization strategy that exploits mixed-integer linear programming (MILP) for global optimization in this generalized framework. To alleviate scalability concerns for large datasets, we also provide highly scalable greedy algorithms inspired by the Majorization-Minimization (MM) framework. Finally, we demonstrate the ability of our models to uncover different interpretable discrete cluster structures in data by experimenting with four real-world datasets.

Aravinth Chembu, Scott Sanner• 2023

Related benchmarks

TaskDatasetResultRank
Clusteringliver-disorders
Regression Coefficient Difference1.5192
3
ClusteringStudent Performance 320
Regression Coefficient Difference5.5334
3
ClusteringParkinsons Telemonitoring 189
Regression Coefficient Difference76.4961
3
ClusteringInfrared Thermography (925)
Regression Coefficient Difference193.4
3
ClusteringStudent Performance
Cluster Assignment Mismatch43.07
3
LPC OptimizationInfrared Thermography UCI ID 925
SSE0.1029
3
ClusteringStock Portfolio Performance 390
Regression Coefficient Difference49.7234
3
ClusteringSolar Flare 89
Regression Coefficient Difference3.8343
3
ClusteringProductivity Prediction 597
Regression Coefficient Difference22.2822
3
ClusteringFacebook Metrics
Regression Coefficient Difference0.5997
3
Showing 10 of 27 rows

Other info

Follow for update