Diff-ES: Stage-wise Structural Diffusion Pruning via Evolutionary Search

About

Diffusion models have achieved remarkable success in high-fidelity image generation but remain computationally demanding due to their multi-step denoising process and large model sizes. Although prior work improves efficiency either by reducing sampling steps or by compressing model parameters, existing structured pruning approaches still struggle to balance real acceleration and image quality preservation. In particular, prior methods such as MosaicDiff rely on heuristic, manually tuned stage-wise sparsity schedules and stitch multiple independently pruned models during inference, which increases memory overhead. However, the importance of diffusion steps is highly non-uniform and model-dependent. As a result, schedules derived from simple heuristics or empirical observations often fail to generalize and may lead to suboptimal performance. To this end, we introduce \textbf{Diff-ES}, a stage-wise structural \textbf{Diff}usion pruning framework via \textbf{E}volutionary \textbf{S}earch, which optimizes the stage-wise sparsity schedule and executes it through memory-efficient weight routing without model duplication. Diff-ES divides the diffusion trajectory into multiple stages, automatically discovers an optimal stage-wise sparsity schedule via evolutionary search, and activates stage-conditioned weights dynamically without duplicating model parameters. Our framework naturally integrates with existing structured pruning methods for diffusion models including depth and width pruning. Extensive experiments on DiT and SDXL demonstrate that Diff-ES consistently achieves wall-clock speedups while incurring minimal degradation in generation quality, establishing state-of-the-art performance for structured diffusion model pruning.

Zongfang Liu, Shengkun Tang, Zongliang Wu, Xin Yuan, Zhiqiang Shen• 2026

Related benchmarks

Task	Dataset	Result
Image Generation	ImageNet (val)	Inception Score268.7	247
Text-to-Image Generation	FLUX.1 Schnell	FID25.88	3
Image Generation	DiT ~40% sparsity	Latency (ms)61.2	3
Image Generation	SDXL 30% sparsity	Latency (ms)1.48e+3	3

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord