Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Hierarchical Variational Policies for Reward-Guided Diffusion

About

Adapting pretrained diffusion models to downstream objectives such as inverse problems often requires expensive test-time guidance or optimization. We propose a principled framework for generating high-quality reward-aligned samples at substantially reduced inference cost. Our approach formulates test-time adaptation as a hierarchical variational model, where control is amortized into a lightweight yet expressive stochastic policy. This formulation naturally supports few-step diffusion sampling: large step sizes enable fast inference, while the learned policy maintains sample quality by providing structured per-step control. The resulting fully amortized sampler achieves a strong quality--speed tradeoff, matching or exceeding recent test-time scaling baselines while requiring significantly less compute. For example, on 4x super-resolution, our method achieves better perceptual quality with more than 5x faster inference compared to the best-performing baseline. We further extend our approach to a semi-amortized regime that combines cheap amortized proposals with limited test-time optimization, achieving state-of-the-art perceptual quality across several challenging inverse problems.

Kushagra Pandey, Farrin Marouf Sofian, Jan Niklas Groeneveld, Felix Draxler, Stephan Mandt• 2026

Related benchmarks

TaskDatasetResultRank
Super-ResolutionFFHQ 256 x 256
PSNR29.9
52
Super-ResolutionImageNet 256
PSNR23.67
50
InpaintingImageNet 256
PSNR20.74
30
HDR ReconstructionFFHQ 256 x 256
PSNR25.52
14
HDR ReconstructionImageNet 256 x 256
PSNR22.65
13
Random Inpainting (90%)ImageNet 256
Time (s)1.3
10
Super-resolution (x8)ImageNet 256
Time (s)1.3
10
Random Inpainting (90%)FFHQ-256
Inference Time (s)0.6
10
Super-Resolution (x4)FFHQ-256
Time (s)0.6
10
Super-resolution (x8)FFHQ-256
Time (s)0.6
10
Showing 10 of 13 rows

Other info

Follow for update