Oracle Based Active Set Algorithm for Scalable Elastic Net Subspace Clustering
About
State-of-the-art subspace clustering methods are based on expressing each data point as a linear combination of other data points while regularizing the matrix of coefficients with $\ell_1$, $\ell_2$ or nuclear norms. $\ell_1$ regularization is guaranteed to give a subspace-preserving affinity (i.e., there are no connections between points from different subspaces) under broad theoretical conditions, but the clusters may not be connected. $\ell_2$ and nuclear norm regularization often improve connectivity, but give a subspace-preserving affinity only for independent subspaces. Mixed $\ell_1$, $\ell_2$ and nuclear norm regularizations offer a balance between the subspace-preserving and connectedness properties, but this comes at the cost of increased computational complexity. This paper studies the geometry of the elastic net regularizer (a mixture of the $\ell_1$ and $\ell_2$ norms) and uses it to derive a provably correct and scalable active set method for finding the optimal coefficients. Our geometric analysis also provides a theoretical justification and a geometric interpretation for the balance between the connectedness (due to $\ell_2$ regularization) and subspace-preserving (due to $\ell_1$ regularization) properties for elastic net subspace clustering. Our experiments show that the proposed active set method not only achieves state-of-the-art clustering performance, but also efficiently handles large-scale datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Clustering | MNIST (test) | NMI0.591 | 122 | |
| Clustering | MNIST (full) | Accuracy98 | 98 | |
| Clustering | Fashion MNIST | NMI70.5 | 95 | |
| Clustering | USPS | NMI68.4 | 82 | |
| Image Clustering | CIFAR-10 (full) | Accuracy0.613 | 33 | |
| Subspace Clustering | MNIST | Average Clustering Accuracy (%)93.79 | 32 | |
| Clustering | REUTERS 10K | ACC40.1 | 23 | |
| Subspace Clustering | Yale-B | ACC65.2 | 21 | |
| Subspace Clustering | ORL | NMI90.3 | 19 | |
| Clustering | Mice Protein | Accuracy0.434 | 15 |