MDNS: Masked Diffusion Neural Sampler via Stochastic Optimal Control
About
We study the problem of learning a neural sampler to generate samples from discrete state spaces where the target probability mass function $\pi\propto\mathrm{e}^{-U}$ is known up to a normalizing constant, which is an important task in fields such as statistical physics, machine learning, combinatorial optimization, etc. To better address this challenging task when the state space has a large cardinality and the distribution is multi-modal, we propose $\textbf{M}$asked $\textbf{D}$iffusion $\textbf{N}$eural $\textbf{S}$ampler ($\textbf{MDNS}$), a novel framework for training discrete neural samplers by aligning two path measures through a family of learning objectives, theoretically grounded in the stochastic optimal control of the continuous-time Markov chains. We validate the efficiency and scalability of MDNS through extensive experiments on various distributions with distinct statistical properties, where MDNS learns to accurately sample from the target distributions despite the extremely high problem dimensions and outperforms other learning-based baselines by a large margin. A comprehensive study of ablations and extensions is also provided to demonstrate the efficacy and potential of the proposed framework. Our code is available at https://github.com/yuchen-zhu-zyc/MDNS.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Posterior Sampling | Ising beta=0.4407 16x16 | Sink Metric61.6 | 8 | |
| Sampling on discretised synthetic densities | 40GMM d=16 | MMD0.03 | 8 | |
| Sampling on discretised synthetic densities | ManyWell (d=80) | MMD0.04 | 8 | |
| Sampling on discretised synthetic densities | 40GMM d=32 | MMD0.17 | 8 | |
| Sampling on discretised synthetic densities | Manywell d = 32 | MMD0.03 | 8 | |
| Posterior Sampling | Potts 16x16 (q=3, beta=1.005) | Sinkhorn Distance84.78 | 8 | |
| Posterior Sampling | Potts q=3, beta=1.2 16x16 | Sinkhorn Distance99.95 | 8 | |
| Posterior Sampling | Ising beta=0.6 16x16 | Sinkhorn Distance48.71 | 8 | |
| Posterior Sampling | Ising 16x16 (beta=1.2) | Sinkhorn Distance126.3 | 8 |