Hierarchical Variational Policies for Reward-Guided Diffusion
About
Adapting pretrained diffusion models to downstream objectives such as inverse problems often requires expensive test-time guidance or optimization. We propose a principled framework for generating high-quality reward-aligned samples at substantially reduced inference cost. Our approach formulates test-time adaptation as a hierarchical variational model, where control is amortized into a lightweight yet expressive stochastic policy. This formulation naturally supports few-step diffusion sampling: large step sizes enable fast inference, while the learned policy maintains sample quality by providing structured per-step control. The resulting fully amortized sampler achieves a strong quality--speed tradeoff, matching or exceeding recent test-time scaling baselines while requiring significantly less compute. For example, on 4x super-resolution, our method achieves better perceptual quality with more than 5x faster inference compared to the best-performing baseline. We further extend our approach to a semi-amortized regime that combines cheap amortized proposals with limited test-time optimization, achieving state-of-the-art perceptual quality across several challenging inverse problems.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Super-Resolution | FFHQ 256 x 256 | PSNR29.9 | 52 | |
| Super-Resolution | ImageNet 256 | PSNR23.67 | 50 | |
| Inpainting | ImageNet 256 | PSNR20.74 | 30 | |
| HDR Reconstruction | FFHQ 256 x 256 | PSNR25.52 | 14 | |
| HDR Reconstruction | ImageNet 256 x 256 | PSNR22.65 | 13 | |
| Random Inpainting (90%) | ImageNet 256 | Time (s)1.3 | 10 | |
| Super-resolution (x8) | ImageNet 256 | Time (s)1.3 | 10 | |
| Random Inpainting (90%) | FFHQ-256 | Inference Time (s)0.6 | 10 | |
| Super-Resolution (x4) | FFHQ-256 | Time (s)0.6 | 10 | |
| Super-resolution (x8) | FFHQ-256 | Time (s)0.6 | 10 |