Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares
About
The matrix-completion problem has attracted a lot of attention, largely as a result of the celebrated Netflix competition. Two popular approaches for solving the problem are nuclear-norm-regularized matrix approximation (Candes and Tao, 2009, Mazumder, Hastie and Tibshirani, 2010), and maximum-margin matrix factorization (Srebro, Rennie and Jaakkola, 2005). These two procedures are in some cases solving equivalent problems, but with quite different algorithms. In this article we bring the two approaches together, leading to an efficient algorithm for large matrix factorization and completion that outperforms both of these. We develop a software package "softImpute" in R for implementing our approaches, and a distributed version for very large matrices using the "Spark" cluster programming environment.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Tabular Imputation | MissBench (test) | MCAR Score0.265 | 15 | |
| Tabular Data Imputation | MissBench (overall) | MCAR Score46.4 | 15 | |
| Imputation | OpenML MCAR, Missing Probability 0.4 (test) | MAD0.001 | 13 | |
| Link Prediction | Primary school network of interactions | Normalized Squared Frobenius Error0.357 | 4 | |
| Link Prediction | Network of co-authorship 892 nodes (50% missing values) | Norm. Squared Frobenius Error0.894 | 3 |