Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Pluggable Pruning with Contiguous Layer Distillation for Diffusion Transformers

About

Diffusion Transformers (DiTs) have shown exceptional performance in image generation, yet their large parameter counts incur high computational costs, impeding deployment in resource-constrained settings. To address this, we propose Pluggable Pruning with Contiguous Layer Distillation (PPCL), a flexible structured pruning framework specifically designed for DiT architectures. First, we identify redundant layer intervals through a linear probing mechanism combined with the first-order differential trend analysis of similarity metrics. Subsequently, we propose a plug-and-play teacher-student alternating distillation scheme tailored to integrate depth-wise and width-wise pruning within a single training phase. This distillation framework enables flexible knowledge transfer across diverse pruning ratios, eliminating the need for per-configuration retraining. Extensive experiments on multiple Multi-Modal Diffusion Transformer architecture models demonstrate that PPCL achieves a 50\% reduction in parameter count compared to the full model, with less than 3\% degradation in key objective metrics. Notably, our method maintains high-quality image generation capabilities while achieving higher compression ratios, rendering it well-suited for resource-constrained environments. The open-source code, checkpoints for PPCL can be found at the following link: https://github.com/OPPO-Mente-Lab/Qwen-Image-Pruning.

Jian Ma, Qirong Peng, Xujie Zhu, Peixing Xie, Chen Chen, Haonan Lu• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGenEval
GenEval Score84.7
277
Text-to-Image GenerationDPG
Overall Score87.9
131
Text-to-Image GenerationDPG-Bench
DPG Score87.9
89
Text-to-Image GenerationGenEval
Overall Score78.4
68
Text-to-Image GenerationOneIG-Bench--
33
Spatial Reasoning GenerationOneIG-EN (test)
Alignment Score83.9
26
Text-to-Image GenerationOneIG-ZH
Alignment85.4
24
Text-to-Image GenerationT2I-CompBench
B-VQA Score75
16
Text-to-Image GenerationGenEval
GenEval Score84.7
16
Long-text-to-Image GenerationLongText-Bench
EN Score92.9
15
Showing 10 of 13 rows

Other info

Follow for update