Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Style Aligned Image Generation via Shared Attention

About

Large-scale Text-to-Image (T2I) models have rapidly gained prominence across creative fields, generating visually compelling outputs from textual prompts. However, controlling these models to ensure consistent style remains challenging, with existing methods necessitating fine-tuning and manual intervention to disentangle content and style. In this paper, we introduce StyleAligned, a novel technique designed to establish style alignment among a series of generated images. By employing minimal `attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models. This approach allows for the creation of style-consistent images using a reference style through a straightforward inversion operation. Our method's evaluation across diverse styles and text prompts demonstrates high-quality synthesis and fidelity, underscoring its efficacy in achieving consistent style across various inputs.

Amir Hertz, Andrey Voynov, Shlomi Fruchter, Daniel Cohen-Or• 2023

Related benchmarks

TaskDatasetResultRank
Image Style TransferUser Study
Overall Quality Score74.4
30
Style aligned image generation100 text prompts (test)
Text Alignment (CLIP Score)28.9
11
Object Replacement and Style BlendingObject Replacement and Style Blending (800 pairs) (test)
BOSM0.4125
11
Object Replacement and Object BlendingUnsplash 4,000 samples (test)
BOM0.2371
10
Style TransferCIFAR-100 and InstaStyle (test)
Content Score28.1
9
Text-to-Image GenerationIn-the-wild image color condition
FID73.1
7
Preference-conditioned image generationPREFBENCH
FID167.5
7
Preference-conditioned image generationPick-a-Pic processed
FID200.3
7
Text-to-Image GenerationSampled color condition (Manual)
FID177
7
Style TransferSingle image on A100 GPU (test)
Inference Time (s)18
7
Showing 10 of 14 rows

Other info

Code

Follow for update