Imbalanced Mixed Linear Regression
About
We consider the problem of mixed linear regression (MLR), where each observed sample belongs to one of $K$ unknown linear models. In practical applications, the proportions of the $K$ components are often imbalanced. Unfortunately, most MLR methods do not perform well in such settings. Motivated by this practical challenge, in this work we propose Mix-IRLS, a novel, simple and fast algorithm for MLR with excellent performance on both balanced and imbalanced mixtures. In contrast to popular approaches that recover the $K$ models simultaneously, Mix-IRLS does it sequentially using tools from robust regression. Empirically, Mix-IRLS succeeds in a broad range of settings where other methods fail. These include imbalanced mixtures, small sample sizes, presence of outliers, and an unknown number of models $K$. In addition, Mix-IRLS outperforms competing methods on several real-world datasets, in some cases by a large margin. We complement our empirical results by deriving a recovery guarantee for Mix-IRLS, which highlights its advantage on imbalanced mixtures.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mixed Linear Regression | medical | Minimal Error (K=2)0.1594 | 5 | |
| Mixed Linear Regression | Wine | Minimal Error (K=2)0.4827 | 5 | |
| Mixed Linear Regression | WHO | Minimal Error (K=2)0.2276 | 5 | |
| Mixed Linear Regression | Fish | Minimal Error (K=2)0.1656 | 5 |