Accelerated projected gradient algorithms for sparsity constrained optimization problems
About
We consider the projected gradient algorithm for the nonconvex best subset selection problem that minimizes a given empirical loss function under an $\ell_0$-norm constraint. Through decomposing the feasible set of the given sparsity constraint as a finite union of linear subspaces, we present two acceleration schemes with global convergence guarantees, one by same-space extrapolation and the other by subspace identification. The former fully utilizes the problem structure to greatly accelerate the optimization speed with only negligible additional cost. The latter leads to a two-stage meta-algorithm that first uses classical projected gradient iterations to identify the correct subspace containing an optimal solution, and then switches to a highly-efficient smooth optimization method in the identified subspace to attain superlinear convergence. Experiments demonstrate that the proposed accelerated algorithms are magnitudes faster than their non-accelerated counterparts as well as the state of the art.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Sparse Classification | gisette scale | CPU Cost0.07 | 40 | |
| Sparsity-constrained Optimization | duke | CPU Time0.01 | 40 | |
| Regression | E2006 tfidf | MSE0.139 | 25 | |
| Classification | colon-cancer | CPU0.01 | 20 | |
| Classification | duke | CPU Cost0.01 | 20 | |
| Classification | leukemia | CPU Usage0.01 | 20 | |
| Regression | E2006-log1p (test) | CPU Time19.5 | 20 | |
| Sparse Classification | colon-cancer | CPU Time0.02 | 20 | |
| Sparse Classification | duke | CPU0.01 | 20 | |
| Sparse Classification | leukemia | CPU0.01 | 20 |