Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation

About

In-domain generation aims to perform a variety of tasks within a specific domain, such as unconditional generation, text-to-image, image editing, 3D generation, and more. Early research typically required training specialized generators for each unique task and domain, often relying on fully-labeled data. Motivated by the powerful generative capabilities and broad applications of diffusion models, we are driven to explore leveraging label-free data to empower these models for in-domain generation. Fine-tuning a pre-trained generative model on domain data is an intuitive but challenging way and often requires complex manual hyper-parameter adjustments since the limited diversity of the training data can easily disrupt the model's original generative capabilities. To address this challenge, we propose a guidance-decoupled prior preservation mechanism to achieve high generative quality and controllability by image-only data, inspired by preserving the pre-trained model from a denoising guidance perspective. We decouple domain-related guidance from the conditional guidance used in classifier-free guidance mechanisms to preserve open-world control guidance and unconditional guidance from the pre-trained model. We further propose an efficient domain knowledge learning technique to train an additional text-free UNet copy to predict domain guidance. Besides, we theoretically illustrate a multi-guidance in-domain generation pipeline for a variety of generative tasks, leveraging multiple guidances from distinct diffusion models and conditions. Extensive experiments demonstrate the superiority of our method in domain-specific synthesis and its compatibility with various diffusion-based control methods and applications.

Pu Cao, Feng Zhou, Lu Yang, Tianrui Huang, Qing Song• 2023

Related benchmarks

TaskDatasetResultRank
Image GenerationFaces
FID6.57
18
Unconditional GenerationAnimal
FID18.82
6
Unconditional GenerationPorcelain
FID56.46
6
Text-guided generationPorcelain
Alignment Score89
6
Spatial-Guided GenerationCelebA-HQ
Alignment Score20
6
Showing 5 of 5 rows

Other info

Code

Follow for update