Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing

About

Despite the success in large-scale text-to-image generation and text-conditioned image editing, existing methods still struggle to produce consistent generation and editing results. For example, generation approaches usually fail to synthesize multiple images of the same objects/characters but with different views or poses. Meanwhile, existing editing methods either fail to achieve effective complex non-rigid editing while maintaining the overall textures and identity, or require time-consuming fine-tuning to capture the image-specific appearance. In this paper, we develop MasaCtrl, a tuning-free method to achieve consistent image generation and complex non-rigid image editing simultaneously. Specifically, MasaCtrl converts existing self-attention in diffusion models into mutual self-attention, so that it can query correlated local contents and textures from source images for consistency. To further alleviate the query confusion between foreground and background, we propose a mask-guided mutual self-attention strategy, where the mask can be easily extracted from the cross-attention maps. Extensive experiments show that the proposed MasaCtrl can produce impressive results in both consistent image generation and complex non-rigid real image editing.

Mingdeng Cao, Xintao Wang, Zhongang Qi, Ying Shan, Xiaohu Qie, Yinqiang Zheng• 2023

Related benchmarks

TaskDatasetResultRank
Image EditingPIE-Bench
PSNR22.78
116
Image EditingPIE-Bench (test)
PSNR22.19
46
Semantic EditingLSUN church
CLIP-Score0.219
28
Image-to-Image Translation (Appearance Divergence)LAION Mini
Structure Similarity94.1
20
Image-to-Image Translation (Appearance Consistency)LAION Mini
Structure Similarity0.937
20
Image Semantic EditingPIE-Bench (test)
PSNR22.2
18
Image EditingPIE-Bench
Distance 10324.46
17
Text-Guided Image EditingGeneral Image Editing
Speedup1.12
12
Image EditingSNR-Bench 1.0 (test)
Reward Model Structural Score3.04
12
Image EditingImageNet real-edit
CS Score31.4
11
Showing 10 of 36 rows

Other info

Follow for update