Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

About

Recently, a series of diffusion-aware distillation algorithms have emerged to alleviate the computational overhead associated with the multi-step inference process of Diffusion Models (DMs). Current distillation techniques often dichotomize into two distinct aspects: i) ODE Trajectory Preservation; and ii) ODE Trajectory Reformulation. However, these approaches suffer from severe performance degradation or domain shifts. To address these limitations, we propose Hyper-SD, a novel framework that synergistically amalgamates the advantages of ODE Trajectory Preservation and Reformulation, while maintaining near-lossless performance during step compression. Firstly, we introduce Trajectory Segmented Consistency Distillation to progressively perform consistent distillation within pre-defined time-step segments, which facilitates the preservation of the original ODE trajectory from a higher-order perspective. Secondly, we incorporate human feedback learning to boost the performance of the model in a low-step regime and mitigate the performance loss incurred by the distillation process. Thirdly, we integrate score distillation to further improve the low-step generation capability of the model and offer the first attempt to leverage a unified LoRA to support the inference process at all steps. Extensive experiments and user studies demonstrate that Hyper-SD achieves SOTA performance from 1 to 8 inference steps for both SDXL and SD1.5. For example, Hyper-SDXL surpasses SDXL-Lightning by +0.68 in CLIP Score and +0.51 in Aes Score in the 1-step inference.

Yuxi Ren, Xin Xia, Yanzuo Lu, Jiacheng Zhang, Jie Wu, Pan Xie, Xing Wang, Xuefeng Xiao• 2024

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGenEval
GenEval Score56
277
Text-to-Image GenerationGenEval (test)
Two Obj. Acc77
169
Text-to-Image GenerationT2I-CompBench (test)
Color Accuracy65.35
67
Text-to-Image GenerationMS-COCO 30K (test)
FID30.87
41
Text-to-Image GenerationText-to-Image Generation
CLIP Score0.3029
34
Text-to-Image GenerationMS-COCO 5K 2017 (val)
FID30.38
34
Text-to-Image GenerationOneIG-Bench
Alignment0.79
33
Composition Image GenerationGenEval
GenEval Score70.03
20
Text-to-Image GenerationMS-COCO 10K prompts 2014 (val)
FID29.8
19
Text-to-Image GenerationHPS prompt set v2
CLIP Score0.285
11
Showing 10 of 12 rows

Other info

Code

Follow for update