Joint Model and Data Sparsification via the Marginal Likelihood
About
Sparse recovery in linear systems underpins applications from signal processing to high-dimensional regression. Sparse Bayesian Learning, grounded in the principle of automatic relevance determination (ARD), offers a practical Bayesian mechanism for feature sparsity via marginal likelihood optimization. Yet, its reliance on a homoscedastic noise model renders it sensitive to data contaminations such as outliers or misspecified noise, harming model fit and predictions. Instead, we propose jointly learning individual feature and sample relevancies, enabling simultaneous model and data sparsification via a single Bayesian objective. This symmetric pruning of model and data offers a natural extension that preserves conjugacy, admits closed-form updates for standard optimization procedures, and aligns with perspectives from robust regression and influence functions. Empirical results across diverse regression tasks affirm that a joint ARD approach consistently yields both sparse and robust prediction models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Kernel regression | Boston 20% (test) | RMSE3.303 | 28 | |
| Kernel regression | Boston 20% n=506 (test) | NLL2.593 | 20 | |
| Regression | Boston (test) | NLL2.931 | 12 | |
| Regression | Power 10% outlier contamination (test) | RMSE4.19 | 11 | |
| Regression | Kin8nm 10% outlier contamination (test) | RMSE0.136 | 11 | |
| Regression | Elevators 10% outlier contamination (test) | RMSE0.003 | 11 | |
| Regression | Yacht 10% outlier contamination | RMSE3.75 | 11 | |
| Regression | Concrete 10% outlier contamination | RMSE7.55 | 11 | |
| Regression | Kin8nm (no contamination) | RMSE0.132 | 11 | |
| Regression | Elevators (no contamination) | RMSE0.003 | 11 |