Pluggable Pruning with Contiguous Layer Distillation for Diffusion Transformers

About

Diffusion Transformers (DiTs) have shown exceptional performance in image generation, yet their large parameter counts incur high computational costs, impeding deployment in resource-constrained settings. To address this, we propose Pluggable Pruning with Contiguous Layer Distillation (PPCL), a flexible structured pruning framework specifically designed for DiT architectures. First, we identify redundant layer intervals through a linear probing mechanism combined with the first-order differential trend analysis of similarity metrics. Subsequently, we propose a plug-and-play teacher-student alternating distillation scheme tailored to integrate depth-wise and width-wise pruning within a single training phase. This distillation framework enables flexible knowledge transfer across diverse pruning ratios, eliminating the need for per-configuration retraining. Extensive experiments on multiple Multi-Modal Diffusion Transformer architecture models demonstrate that PPCL achieves a 50\% reduction in parameter count compared to the full model, with less than 3\% degradation in key objective metrics. Notably, our method maintains high-quality image generation capabilities while achieving higher compression ratios, rendering it well-suited for resource-constrained environments. The open-source code, checkpoints for PPCL can be found at the following link: https://github.com/OPPO-Mente-Lab/Qwen-Image-Pruning.

Jian Ma, Qirong Peng, Xujie Zhu, Peixing Xie, Chen Chen, Haonan Lu• 2025

Related benchmarks

Task	Dataset	Result
Text-to-Image Generation	GenEval	GenEval Score84.7	442
Text-to-Image Generation	DPG	Overall Score87.9	256
Text-to-Image Generation	DPG-Bench	DPG Score87.9	156
Text-to-Image Generation	GenEval	Overall Score78.4	96
Text-to-Image Generation	OneIG-ZH	Alignment85.4	66
Text-to-Image Generation	OneIG-Bench	--	33
Long-text-to-Image Generation	LongText-Bench	EN Score92.9	27
Text Rendering	LongText-Bench Chinese	Score0.885	27
Text Rendering	LongText-Bench English	Score0.871	27
Spatial Reasoning Generation	OneIG-EN (test)	Alignment Score83.9	26

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord