Best Subset Selection via a Modern Optimization Lens

About

In the last twenty-five years (1990-2014), algorithmic advances in integer optimization combined with hardware improvements have resulted in an astonishing 200 billion factor speedup in solving Mixed Integer Optimization (MIO) problems. We present a MIO approach for solving the classical best subset selection problem of choosing $k$ out of $p$ features in linear regression given $n$ observations. We develop a discrete extension of modern first order continuous optimization methods to find high quality feasible solutions that we use as warm starts to a MIO solver that finds provably optimal solutions. The resulting algorithm (a) provides a solution with a guarantee on its suboptimality even if we terminate the algorithm early, (b) can accommodate side constraints on the coefficients of the linear regression and (c) extends to finding best subset solutions for the least absolute deviation loss function. Using a wide variety of synthetic and real datasets, we demonstrate that our approach solves problems with $n$ in the 1000s and $p$ in the 100s in minutes to provable optimality, and finds near optimal solutions for $n$ in the 100s and $p$ in the 1000s in minutes. We also establish via numerical experiments that the MIO approach performs better than {\texttt {Lasso}} and other popularly used sparse learning procedures, in terms of achieving sparse solutions with good predictive power.

Dimitris Bertsimas, Angela King, Rahul Mazumder• 2015

Related benchmarks

Task	Dataset	Result
Sparse Classification	gisette scale	CPU Cost16.11	40
Sparsity-constrained Optimization	duke	CPU Time4.07	40
Regression	E2006 tfidf	MSE0.152	25
Classification	news20 (test)	CPU Score904.5	20
Regression	E2006-log1p (test)	CPU Time3.00e+3	20
Sparsity-constrained Optimization	colon-cancer	CPU Efficiency0.44	20
Sparsity-constrained Optimization	gisette scale	CPU Time13.35	20
Classification	colon-cancer	CPU1.69	20
Sparse Regression	colon-cancer	CPU Time2.03	20
Sparse Regression	leukemia	CPU Time4.22	20

Showing 10 of 25 rows

Other info

Follow for update

@wizwand_team Discord