Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Feature Selection using Stochastic Gates

About

Feature selection problems have been extensively studied for linear estimation, for instance, Lasso, but less emphasis has been placed on feature selection for non-linear functions. In this study, we propose a method for feature selection in high-dimensional non-linear function estimation problems. The new procedure is based on minimizing the $\ell_0$ norm of the vector of indicator variables that represent if a feature is selected or not. Our approach relies on the continuous relaxation of Bernoulli distributions, which allows our model to learn the parameters of the approximate Bernoulli distributions via gradient descent. This general framework simultaneously minimizes a loss function while selecting relevant features. Furthermore, we provide an information-theoretic justification of incorporating Bernoulli distribution into our approach and demonstrate the potential of the approach on synthetic and real-life applications.

Yutaro Yamada, Ofir Lindenbaum, Sahand Negahban, Yuval Kluger• 2018

Related benchmarks

TaskDatasetResultRank
Tabular ClassificationTabZilla avg across 98 datasets
Mean Accuracy81
20
Feature SelectionSyn2
TPR100
12
Feature SelectionSyn4
TPR100
12
Feature SelectionSyn3
TPR100
12
Chemistry TaskChem1 (test)
TPR100
12
Chemistry TaskChem2 (test)
TPR1
12
Chemistry TaskChem3 (test)
TPR1
12
Feature SelectionSyn1
TPR100
12
Regression15 regression datasets
Mean R20.332
9
Group Feature SelectionSyn2
TPR1
6
Showing 10 of 17 rows

Other info

Follow for update