Style Aligned Image Generation via Shared Attention

About

Large-scale Text-to-Image (T2I) models have rapidly gained prominence across creative fields, generating visually compelling outputs from textual prompts. However, controlling these models to ensure consistent style remains challenging, with existing methods necessitating fine-tuning and manual intervention to disentangle content and style. In this paper, we introduce StyleAligned, a novel technique designed to establish style alignment among a series of generated images. By employing minimal `attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models. This approach allows for the creation of style-consistent images using a reference style through a straightforward inversion operation. Our method's evaluation across diverse styles and text prompts demonstrates high-quality synthesis and fidelity, underscoring its efficacy in achieving consistent style across various inputs.

Amir Hertz, Andrey Voynov, Shlomi Fruchter, Daniel Cohen-Or• 2023

Related benchmarks

Task	Dataset	Result
Image Style Transfer	User Study	Overall Quality Score74.4	30
Style aligned image generation	100 text prompts (test)	Text Alignment (CLIP Score)28.9	11
Object Replacement and Style Blending	Object Replacement and Style Blending (800 pairs) (test)	BOSM0.4125	11
Object Replacement and Object Blending	Unsplash 4,000 samples (test)	BOM0.2371	10
Style Transfer	CIFAR-100 and InstaStyle (test)	Content Score28.1	9
Style Transfer	Style Transfer Evaluation Set (test)	Style Score59.8	8
Style Transfer	Pinterest Styles 1.0 (test)	CSD0.58	8
Text-to-Image Generation	In-the-wild image color condition	FID73.1	7
Preference-conditioned image generation	PREFBENCH	FID167.5	7
Preference-conditioned image generation	Pick-a-Pic processed	FID200.3	7

Showing 10 of 18 rows

Other info

Code

Follow for update

@wizwand_team Discord