Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CLIPstyler: Image Style Transfer with a Single Text Condition

About

Existing neural style transfer methods require reference style images to transfer texture information of style images to content images. However, in many practical situations, users may not have reference style images but still be interested in transferring styles by just imagining them. In order to deal with such applications, we propose a new framework that enables a style transfer `without' a style image, but only with a text description of the desired style. Using the pre-trained text-image embedding model of CLIP, we demonstrate the modulation of the style of content images only with a single text condition. Specifically, we propose a patch-wise text-image matching loss with multiview augmentations for realistic texture transfer. Extensive experimental results confirmed the successful image style transfer with realistic textures that reflect semantic query texts.

Gihyun Kwon, Jong Chul Ye• 2021

Related benchmarks

TaskDatasetResultRank
Semantic segmentationCityscapes
mIoU32.4
578
Semantic segmentationACDC (test)
mIoU36.75
47
Semantic segmentationACDC (Night)
mIoU21.38
38
Semantic segmentationACDC (Rain)
mIoU38.7
31
Semantic segmentationGTA5
mIoU38.73
28
Semantic segmentationACDC Snow
mIoU41.09
26
Semantic segmentationACDC Snow (test)
mIoU41
20
Affective Image StylizationEmoEdit (inference)
CLIP Score0.709
11
Affective Image FilterAIF
SSIM52.49
11
Text-driven Style TransferCustom Stylized Images 10 text conditions (test)
CLIP Score0.2515
7
Showing 10 of 20 rows

Other info

Code

Follow for update