Attention Distillation: A Unified Approach to Visual Characteristics Transfer
About
Recent advances in generative diffusion models have shown a notable inherent understanding of image style and semantics. In this paper, we leverage the self-attention features from pretrained diffusion networks to transfer the visual characteristics from a reference to generated images. Unlike previous work that uses these features as plug-and-play attributes, we propose a novel attention distillation loss calculated between the ideal and current stylization results, based on which we optimize the synthesized image via backpropagation in latent space. Next, we propose an improved Classifier Guidance that integrates attention distillation loss into the denoising sampling process, further accelerating the synthesis and enabling a broad range of image generation applications. Extensive experiments have demonstrated the extraordinary performance of our approach in transferring the examples' style, appearance, and texture to new images in synthesis. Code is available at https://github.com/xugao97/AttentionDistillation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Style Transfer | MS-COCO and WikiArt 1,000 images each | ArtFID16.17 | 11 | |
| Image Style Transfer | Style Transfer 750 images (test) | Style Score0.5249 | 10 | |
| Style Transfer | Style Transfer Evaluation Set (test) | Style Score85.59 | 8 | |
| Style Transfer | Pinterest Styles 1.0 (test) | CSD0.64 | 8 | |
| Style Transfer | BCS-Bench | DINO0.6111 | 8 | |
| Style Transfer | User Study | Rank 1 Score9.17 | 8 | |
| Style Transfer | User Study 10 style transfer results (test) | Visual Preference Score3.77 | 3 |