Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models

About

The success of recent text-to-image diffusion models is largely due to their capacity to be guided by a complex text prompt, which enables users to precisely describe the desired content. However, these models struggle to effectively suppress the generation of undesired content, which is explicitly requested to be omitted from the generated image in the prompt. In this paper, we analyze how to manipulate the text embeddings and remove unwanted content from them. We introduce two contributions, which we refer to as $\textit{soft-weighted regularization}$ and $\textit{inference-time text embedding optimization}$. The first regularizes the text embedding matrix and effectively suppresses the undesired content. The second method aims to further suppress the unwanted content generation of the prompt, and encourages the generation of desired content. We evaluate our method quantitatively and qualitatively on extensive experiments, validating its effectiveness. Furthermore, our method is generalizability to both the pixel-space diffusion models (i.e. DeepFloyd-IF) and the latent-space diffusion models (i.e. Stable Diffusion).

Senmao Li, Joost van de Weijer, Taihang Hu, Fahad Shahbaz Khan, Qibin Hou, Yaxing Wang, Jian Yang• 2024

Related benchmarks

Task	Dataset	Result
Style Unlearning	UnlearnCanvas	UA0.569	36
Object Unlearning	UnlearnCanvas	Unlearning Accuracy (UA)23.25	31
Object concept unlearning	UnlearnCanvas	Unlearning Accuracy (UA)23.25	23
Concept Unlearning	UnlearnCanvas	Total Avg. Acc72.91	22
Concept Unlearning	UnlearnCanvas object concept unlearning	Unlearning Accuracy23.25	11
Object Unlearning	UnlearnCanvas	Unlearning Accuracy (UA)23.25	11

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord