Preconditioned Regularized Wasserstein Proximal Sampling
About
We consider sampling from a Gibbs distribution by evolving finitely many particles. We propose a preconditioned version of a recently proposed noise-free sampling method, governed by approximating the score function with the numerically tractable score of a regularized Wasserstein proximal operator. This is derived by a Cole--Hopf transformation on coupled anisotropic heat equations, yielding a kernel formulation for the preconditioned regularized Wasserstein proximal. The diffusion component of the proposed method is also interpreted as a modified self-attention block, as in transformer architectures. For quadratic potentials, we provide a discrete-time non-asymptotic convergence analysis and explicitly characterize the bias, which is dependent on regularization and independent of step-size. Experiments demonstrate acceleration and particle-level stability on various log-concave and non-log-concave toy examples to Bayesian total-variation regularized image deconvolution, and competitive/better performance on non-convex Bayesian neural network training when utilizing variable preconditioning matrices.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Deconvolution | set3c butterfly image (test) | PSNR37.35 | 18 | |
| Bayesian Neural Networks | UCI Boston (test) | RMSE2.866 | 16 | |
| Bayesian Neural Network Regression | Combined (test) | RMSE3.925 | 12 | |
| Bayesian Neural Network Regression | kin8nm (test) | RMSE0.087 | 12 | |
| Bayesian Neural Network Regression | concrete (test) | RMSE4.387 | 12 | |
| Bayesian Neural Network Regression | WINE (test) | RMSE0.612 | 12 |