RBF-Solver: A Multistep Sampler for Diffusion Probabilistic Models via Radial Basis Functions
About
Diffusion probabilistic models (DPMs) are widely adopted for their outstanding generative fidelity, yet their sampling is computationally demanding. Polynomial-based multistep samplers mitigate this cost by accelerating inference; however, despite their theoretical accuracy guarantees, they generate the sampling trajectory according to a predefined scheme, providing no flexibility for further optimization. To address this limitation, we propose RBF-Solver, a multistep diffusion sampler that interpolates model evaluations with Gaussian radial basis functions (RBFs). By leveraging learnable shape parameters in Gaussian RBFs, RBF-Solver explicitly follows optimal sampling trajectories. At first order, it reduces to the Euler method (DDIM). At second order or higher, as the shape parameters approach infinity, RBF-Solver converges to the Adams method, ensuring its compatibility with existing samplers. Owing to the locality of Gaussian RBFs, RBF-Solver maintains high image fidelity even at fourth order or higher, where previous samplers deteriorate. For unconditional generation, RBF-Solver consistently outperforms polynomial-based samplers in the high-NFE regime (NFE >= 15). On CIFAR-10 with the Score-SDE model, it achieves an FID of 2.87 with 15 function evaluations and further improves to 2.48 with 40 function evaluations. For conditional ImageNet 256 x 256 generation with the Guided Diffusion model at a guidance scale 8.0, substantial gains are achieved in the low-NFE range (5-10), yielding a 16.12-33.73% reduction in FID relative to polynomial-based samplers.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-Image Generation | Stable Diffusion V1.4 | RMSE Loss0.0503 | 280 | |
| Image Generation | ImageNet 256 10k samples | FID7.28 | 165 | |
| Class-conditional Image Generation | ImageNet 128x128 | FID4.28 | 155 | |
| Text-to-Image Generation | Stable Diffusion 1.4 | CLIP Cosine Similarity0.9893 | 140 | |
| Conditional Image Generation | ImageNet 256x256 Guided-Diffusion (10k samples) | FID7.31 | 128 | |
| Text-to-Image Generation | Stable Diffusion 10k samples v1.4 | CLIP Similarity98.93 | 119 | |
| Image Generation | CIFAR-10 32x32 EDM (test) | FID1.98 | 79 | |
| Image Generation | ImageNet 64x64 (50k samples) | FID17.95 | 44 | |
| Unconditional Image Generation | CIFAR-10 32x32 Score-SDE 50k samples (test) | FID2.48 | 44 | |
| Unconditional Image Generation | CIFAR-10 32x32 EDM (test) | FID1.98 | 44 |