Relay Diffusion: Unifying diffusion process across resolutions for image synthesis

About

Diffusion models achieved great success in image synthesis, but still face challenges in high-resolution generation. Through the lens of discrete cosine transformation, we find the main reason is that \emph{the same noise level on a higher resolution results in a higher Signal-to-Noise Ratio in the frequency domain}. In this work, we present Relay Diffusion Model (RDM), which transfers a low-resolution image or noise into an equivalent high-resolution one for diffusion model via blurring diffusion and block noise. Therefore, the diffusion process can continue seamlessly in any new resolution or model without restarting from pure noise or low-resolution conditioning. RDM achieves state-of-the-art FID on CelebA-HQ and sFID on ImageNet 256$\times$256, surpassing previous works such as ADM, LDM and DiT by a large margin. All the codes and checkpoints are open-sourced at \url{https://github.com/THUDM/RelayDiffusion}.

Jiayan Teng, Wendi Zheng, Ming Ding, Wenyi Hong, Jianqiao Wangni, Zhuoyi Yang, Jie Tang• 2023

Related benchmarks

Task	Dataset	Result
Class-conditional Image Generation	ImageNet 256x256	Inception Score (IS)260.4	967
Class-conditional Image Generation	ImageNet 256x256 (val)	Inception Score (IS)260.4	493
Image Generation	ImageNet 256x256 (val)	FID1.89	399
Class-conditional Image Generation	ImageNet class-conditional 256x256 (test val)	FID1.87	81
Class-to-image generation	ImageNet 256x256	FID1.99	38
Unconditional Image Generation	CelebA-HQ 256x256	Fréchet Distance (FD)5.77	37
Unconditional image synthesis	CelebA-HQ 256 x 256	FID3.15	16
Image Synthesis	CelebA-HQ 256x256 (test)	FID3.2	5

Showing 8 of 8 rows

Other info

Code

Follow for update

@wizwand_team Discord