HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models

About

Diffusion models have become a mainstream approach for high-resolution image synthesis. However, directly generating higher-resolution images from pretrained diffusion models will encounter unreasonable object duplication and exponentially increase the generation time. In this paper, we discover that object duplication arises from feature duplication in the deep blocks of the U-Net. Concurrently, We pinpoint the extended generation times to self-attention redundancy in U-Net's top blocks. To address these issues, we propose a tuning-free higher-resolution framework named HiDiffusion. Specifically, HiDiffusion contains Resolution-Aware U-Net (RAU-Net) that dynamically adjusts the feature map size to resolve object duplication and engages Modified Shifted Window Multi-head Self-Attention (MSW-MSA) that utilizes optimized window attention to reduce computations. we can integrate HiDiffusion into various pretrained diffusion models to scale image generation resolutions even to 4096x4096 at 1.5-6x the inference speed of previous methods. Extensive experiments demonstrate that our approach can address object duplication and heavy computation issues, achieving state-of-the-art performance on higher-resolution image synthesis tasks.

Shen Zhang, Zhaowei Chen, Zhenyu Zhao, Yuhao Chen, Yao Tang, Jiajun Liang• 2023

Related benchmarks

Task	Dataset	Result
Text-to-Image Generation	LAION-5B 1,000 prompts	FID (Real)63.674	20
Text-to-Image Generation	4K Resolution 4K x 4K (test)	CLIP IQA Score0.4021	16
Sinogram Completion	TomoBank	SSIM (Sinogram)90.3	15
Sinogram Completion	TomoBank (test)	Peak GPU Memory (GB)12.5	14
High-Resolution Image Generation	LAION 5B 2x2 scaling factor (test)	FID78.02	7
High-Resolution Image Generation	LAION-5B 4x4 scaling factor (test)	FID129.9	7
High-Resolution Image Generation	LAION-5B 3x3 scaling factor (test)	FID112.5	7
High-Resolution Image Generation	High-resolution Image Generation	FID_r118.6	6
Sinogram Completion	LoDoPaB	SSIM (Sinogram)91.1	5

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord