Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Controlling Text-to-Image Diffusion by Orthogonal Finetuning

About

Large text-to-image diffusion models have impressive capabilities in generating photorealistic images from text prompts. How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method -- Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We find that this property is crucial for preserving the semantic generation ability of text-to-image diffusion models. To improve finetuning stability, we further propose Constrained Orthogonal Finetuning (COFT) which imposes an additional radius constraint to the hypersphere. Specifically, we consider two important finetuning text-to-image tasks: subject-driven generation where the goal is to generate subject-specific images given a few images of a subject and a text prompt, and controllable generation where the goal is to enable the model to take in additional control signals. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.

Zeju Qiu, Weiyang Liu, Haiwen Feng, Yuxuan Xue, Yao Feng, Zhen Liu, Dan Zhang, Adrian Weller, Bernhard Sch\"olkopf• 2023

Related benchmarks

TaskDatasetResultRank
Natural Language UnderstandingGLUE (dev)
SST-2 (Acc)92.8
518
Commonsense ReasoningCommonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA) (test)
BoolQ Accuracy69
202
SegmentationADE20K
mIoU27.06
59
Image GenerationFaces
FID27.5
18
Fine-tuning1D Convection-Diffusion-Reaction (CDR) Equation (train)
Train Loss12.61
14
Fine-tuning1D Convection-Diffusion-Reaction (CDR) Equation (test)
Test Loss12.28
14
PDE solvingCDR Equation beta=1, nu=1, rho=1
Relative L2 Error1.27e+3
12
PDE solvingCDR Equation (beta=3, nu=1, rho=1)
Relative L2 Error1.26e+3
12
PDE solvingCDR Equation (beta=5, nu=1, rho=1)
Relative L2 Error1.32e+3
12
Canny edge to imageCOCO
IoU19.5
6
Showing 10 of 21 rows

Other info

Follow for update