Singular Bayesian Neural Networks
About
Bayesian neural networks promise calibrated uncertainty but require $O(mn)$ parameters for standard mean-field Gaussian posteriors. We argue this cost is often unnecessary, particularly when weight matrices exhibit fast singular value decay. By parameterizing weights as $W = AB^{\top}$ with $A \in \mathbb{R}^{m \times r}$, $B \in \mathbb{R}^{n \times r}$, we induce a posterior that is singular with respect to the Lebesgue measure, concentrating on the rank-$r$ manifold. This singularity captures structured weight correlations through shared latent factors, geometrically distinct from mean-field's independence assumption. We derive PAC-Bayes generalization bounds whose complexity term scales as $\sqrt{r(m+n)}$ instead of $\sqrt{m n}$, and prove loss bounds that decompose the error into optimization and rank-induced bias using the Eckart-Young-Mirsky theorem. We further adapt recent Gaussian complexity bounds for low-rank deterministic networks to Bayesian predictive means. Empirically, across MLPs, LSTMs, and Transformers on standard benchmarks, our method achieves predictive performance competitive with 5-member Deep Ensembles while using up to $15\times$ fewer parameters. Furthermore, it substantially improves OOD detection and often improves calibration relative to mean-field and perturbation baselines.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | FashionMNIST (test) | Accuracy89.07 | 218 | |
| Out-of-Distribution Detection | MIMIC-III Newborn ICU Records | AUC-OOD0.802 | 6 | |
| Out-of-Distribution Detection | Beijing Air Quality | AUROC (OOD)71 | 6 | |
| Time Series Forecasting | Beijing Air Quality | MAE10.63 | 6 | |
| ICU mortality prediction | MIMIC Adult ICU Records III | AUROC0.898 | 6 | |
| Uncertainty Calibration | Beijing Air Quality (test) | Calibration AUC0.126 | 5 | |
| Classification | MNIST (test) | Accuracy98.21 | 4 | |
| Regression | Toy dataset (test) | Single Pass RMSE0.5073 | 2 |