Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Direct Bethe Free Energy Minimization for Bayesian Neural Network

About

We propose training Bayesian neural networks by directly minimizing the Bethe free energy rather than maximizing a variational lower bound. On tree-structured factor graphs the Bethe free energy is exact; deterministic layers drop out of the objective and are trained by standard backpropagation, so the framework accommodates any mixture of probabilistic and deterministic subgraphs without modification. Restricting the weight posterior to a last-layer Gaussian yields analytically tractable losses: for a Gaussian likelihood the Bethe loss equals the exact marginal likelihood, and for a probit likelihood it reduces to a closed form via the probit-Gaussian convolution. Both objectives sit strictly between MAP and the ELBO ($L_\text{MAP} \leq L_\text{Bethe} \leq L_\text{ELBO}$), removing the structural Jensen gap that no choice of variational family can close. The Z-consistent prior formulation makes the prior precision a differentiable parameter, enabling empirical Bayes - joint optimization of weights, covariance, and hyperparameters - in a single gradient pass, with no cross-validation or outer loop. All variants admit a closed-form predictive at MAP-equivalent inference cost, in contrast to ensemble and sampling-based methods. On 8 UCI regression and 12 UCI classification benchmarks evaluated under a single shared hyperparameter regime, Bethe is competitive with standard reference methods at single-pass cost. Independently, joint single-pass empirical Bayes matches grid-search cross-validation of the prior precision on essentially all dataset-variant combinations, eliminating the outer hyperparameter loop without measurable cost. Isolated optimization gaps on a few datasets reflect numerical rather than principled limitations of the framework.

Pavel Prochazka• 2026

Related benchmarks

TaskDatasetResultRank
RegressionUCI ENERGY (test)
Negative Log Likelihood0.838
62
RegressionUCI CONCRETE (test)
Neg Log Likelihood3.416
51
RegressionUCI POWER (test)
Negative Log Likelihood2.875
43
RegressionUCI NAVAL (test)
Negative Log Likelihood-3.141
42
RegressionUCI WINE (test)
Negative Log Likelihood0.959
38
RegressionBoston UCI (test)--
36
RegressionKin8nm
RMSE0.165
24
RegressionEnergy
RMSE0.571
24
ClassificationIonosphere (UCI) (test)
NLL0.332
17
ClassificationAustralian (UCI) (test)
NLL0.349
17
Showing 10 of 17 rows

Other info

Follow for update