Individual-heterogeneous sub-Gaussian Mixture Models
About
The classical Gaussian mixture model assumes homogeneity within clusters, an assumption that often fails in real-world data where observations naturally exhibit varying scales or intensities. To address this, we introduce the individual-heterogeneous sub-Gaussian mixture model, a flexible framework that assigns each observation its own heterogeneity parameter, thereby explicitly capturing the heterogeneity inherent in practical applications. Built upon this model, we propose an efficient spectral method that provably achieves exact recovery of the true cluster labels under mild separation conditions, even in high-dimensional settings where the number of features far exceeds the number of samples. Numerical experiments on both synthetic and real data demonstrate that our method consistently outperforms existing clustering algorithms, including those designed for classical Gaussian mixture models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Classification | Iris | Misclassification Count7 | 3 | |
| Classification | DNA | Misclassification Count353 | 3 | |
| Handwritten digit classification | usps (train) | Error Count2.03e+3 | 3 | |
| Handwritten digit classification | pendigits (train) | Misclassification Rate0.2565 | 3 | |
| Image Segment Classification | segment | Misclassification Count735 | 3 | |
| Satellite Image Classification | satimage (train) | Misclassification Count1.38e+3 | 3 | |
| Classification | Wine | Misclassification Count9 | 3 | |
| Classification | SEEDS | Misclassification Count20 | 3 |