Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Ctrl-Z Sampling: Scaling Diffusion Sampling with Controlled Random Zigzag Explorations

About

Diffusion models generate conditional samples by progressively denoising Gaussian noise, yet the denoising trajectory can stall at visually plausible but low-quality outcomes with conditional misalignment or structural artifacts. We interpret this behavior as local optima in a surrogate quality landscape: Once early denoising commits to a suboptimal global structure, later steps mainly sharpen details and seldom correct the underlying mistake. While existing inference-time approaches explore alternative diffusion states via re-noising with fixed strength or direction, they exhibit limited capacity to escape steep quality plateaus. We propose Controlled Random Zigzag Sampling (Ctrl-Z Sampling),a scalable sampling strategy that detects plateaus in quality landscape via a surrogate score, and allocates exploration only when a plateau is detected. Upon detection, Ctrl-Z Sampling rolls back to noisier states, samples a set of alternative continuations, and updates the trajectory when a candidate improves the score, otherwise escalating the exploration depth to escape the current plateau. The proposed method is model-agnostic and broadly compatible with existing diffusion frameworks. Experiments show that Ctrl-Z Sampling consistently improves generation quality over other inference-time scaling samplers across different NFE budgets, offering a scalable compute-quality trade-off.

Shunqi Mao, Wei Guo, Chaoyi Zhang, Jieting Long, Ke Xie, Weidong Cai• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationPick-a-Pic
ImageReward1.441
107
Text-to-Image GenerationT2I-CompBench
Color Fidelity73.77
46
Text-to-Image GenerationDrawBench
PickScore22.62
23
Showing 3 of 3 rows

Other info

Follow for update