StyleShot: A Snapshot on Any Style
About
In this paper, we show that, a good style representation is crucial and sufficient for generalized style transfer without test-time tuning. We achieve this through constructing a style-aware encoder and a well-organized style dataset called StyleGallery. With dedicated design for style learning, this style-aware encoder is trained to extract expressive style representation with decoupling training strategy, and StyleGallery enables the generalization ability. We further employ a content-fusion encoder to enhance image-driven style transfer. We highlight that, our approach, named StyleShot, is simple yet effective in mimicking various desired styles, i.e., 3D, flat, abstract or even fine-grained styles, without test-time tuning. Rigorous experiments validate that, StyleShot achieves superior performance across a wide range of styles compared to existing state-of-the-art methods. The project page is available at: https://styleshot.github.io/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Style Transfer | User Study | Overall Quality Score76.6 | 30 | |
| Style Transfer | CIFAR-100 and InstaStyle (test) | Content Score26.9 | 9 | |
| Style Transfer | Style-Content Pairs 50 style x 40 content references (test) | CSD Score0.45 | 8 | |
| Text-driven Style Transfer | Benchmark of 52 prompts and 20 style images 1.0 (test) | Text Alignment0.202 | 8 | |
| Image-driven Style Transfer | Image-driven style transfer (evaluation set) | CLIP Alignment Score0.66 | 7 | |
| Style Transfer | Single image on A100 GPU (test) | Inference Time (s)5 | 7 | |
| Text-driven Style Transfer | User preference study set (test) | Human Preference (Text)44.3 | 6 |