A Randomized Algorithm for Sparse PCA based on the Basic SDP Relaxation
About
Sparse Principal Component Analysis (SPCA) is a fundamental technique for dimensionality reduction, and is NP-hard. In this paper, we introduce a randomized approximation algorithm for SPCA, which is based on the basic SDP relaxation. Our algorithm takes an (approximate) SDP solution, constructs one deterministic sparse solution and several randomized solutions, and outputs the best among them. Our algorithm has an approximation ratio of at most the sparsity constant with high probability, if called enough times. Under a technical assumption, which is consistently satisfied in our numerical tests, the average approximation ratio is also bounded by $\mathcal{O}(\log{d})$, where $d$ is the number of features. We show that this technical assumption is satisfied if the SDP solution is low-rank, or has exponentially decaying eigenvalues. We then present two classes of instances for which this technical assumption holds. We also demonstrate that in a covariance model, which generalizes the spiked Wishart model, the deterministic solution in our algorithm achieves a near-optimal approximation ratio. We demonstrate the efficacy of our algorithm through numerical tests on real-world datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Sparse Principal Component Analysis | Eisen 1 | Time0.296 | 12 | |
| Sparse Principal Component Analysis | News | Objective Value7.37e+3 | 6 | |
| Sparse Principal Component Analysis | Eisen2 | Objective Value27.573 | 6 | |
| Sparse Principal Component Analysis | CovColon | Objective Value1.23e+4 | 6 | |
| Sparse Principal Component Analysis | LymphomaCov | Objective Function Value2.06e+3 | 6 | |
| Sparse Principal Component Analysis | Reddit1500 | Objective Value1.20e+3 | 6 | |
| Sparse Principal Component Analysis | Reddit2000 | Objective Value2.54e+3 | 6 |