Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation

About

In-domain generation aims to perform a variety of tasks within a specific domain, such as unconditional generation, text-to-image, image editing, 3D generation, and more. Early research typically required training specialized generators for each unique task and domain, often relying on fully-labeled data. Motivated by the powerful generative capabilities and broad applications of diffusion models, we are driven to explore leveraging label-free data to empower these models for in-domain generation. Fine-tuning a pre-trained generative model on domain data is an intuitive but challenging way and often requires complex manual hyper-parameter adjustments since the limited diversity of the training data can easily disrupt the model's original generative capabilities. To address this challenge, we propose a guidance-decoupled prior preservation mechanism to achieve high generative quality and controllability by image-only data, inspired by preserving the pre-trained model from a denoising guidance perspective. We decouple domain-related guidance from the conditional guidance used in classifier-free guidance mechanisms to preserve open-world control guidance and unconditional guidance from the pre-trained model. We further propose an efficient domain knowledge learning technique to train an additional text-free UNet copy to predict domain guidance. Besides, we theoretically illustrate a multi-guidance in-domain generation pipeline for a variety of generative tasks, leveraging multiple guidances from distinct diffusion models and conditions. Extensive experiments demonstrate the superiority of our method in domain-specific synthesis and its compatibility with various diffusion-based control methods and applications.

Pu Cao, Feng Zhou, Lu Yang, Tianrui Huang, Qing Song• 2023

Related benchmarks

Task	Dataset	Result
Image Generation	Faces	FID6.57	18
Unconditional Generation	Animal	FID18.82	6
Unconditional Generation	Porcelain	FID56.46	6
Text-guided generation	Porcelain	Alignment Score89	6
Spatial-Guided Generation	CelebA-HQ	Alignment Score20	6

Showing 5 of 5 rows

Other info

Code

Follow for update

@wizwand_team Discord