OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot

About

Large-scale text-to-image diffusion models, while powerful, suffer from prohibitive computational cost. Existing one-shot network pruning methods can hardly be directly applied to them due to the iterative denoising nature of diffusion models. To bridge the gap, this paper presents OBS-Diff, a novel one-shot pruning framework that enables accurate and training-free compression of large-scale text-to-image diffusion models. Specifically, (i) OBS-Diff revitalizes the classic Optimal Brain Surgeon (OBS), adapting it to the complex architectures of modern diffusion models and supporting diverse pruning granularity, including unstructured, N:M semi-structured, and structured (MHA heads and FFN neurons) sparsity; (ii) To align the pruning criteria with the iterative dynamics of the diffusion process, by examining the problem from an error-accumulation perspective, we propose a novel timestep-aware Hessian construction that incorporates a logarithmic-decrease weighting scheme, assigning greater importance to earlier timesteps to mitigate potential error accumulation; (iii) Furthermore, a computationally efficient group-wise sequential pruning strategy is proposed to amortize the expensive calibration process. Extensive experiments show that OBS-Diff achieves state-of-the-art one-shot pruning for diffusion models, delivering inference acceleration with minimal degradation in visual quality.

Junhan Zhu, Hesong Wang, Mingluo Su, Zefang Wang, Huan Wang• 2025

Related benchmarks

Task	Dataset	Result
Image Generation	CIFAR-10 32x32	FID7.55	151
Text-to-Image Generation	MS-COCO 2014 (val)	FID27.2	143
Text to Image	MS-COCO 5K prompts 2014 (val)	FID29.15	23
Text-to-Image Generation	SD 3-medium (2B) (evaluation)	CLIP Score0.3168	11
Text-to-Image Generation	SDXL U-Net (test)	FID29.08	10

Showing 5 of 5 rows

Other info

GitHub

Follow for update

@wizwand_team Discord