CLIPstyler: Image Style Transfer with a Single Text Condition

About

Existing neural style transfer methods require reference style images to transfer texture information of style images to content images. However, in many practical situations, users may not have reference style images but still be interested in transferring styles by just imagining them. In order to deal with such applications, we propose a new framework that enables a style transfer `without' a style image, but only with a text description of the desired style. Using the pre-trained text-image embedding model of CLIP, we demonstrate the modulation of the style of content images only with a single text condition. Specifically, we propose a patch-wise text-image matching loss with multiview augmentations for realistic texture transfer. Extensive experimental results confirmed the successful image style transfer with realistic textures that reflect semantic query texts.

Gihyun Kwon, Jong Chul Ye• 2021

Related benchmarks

Task	Dataset	Result
Semantic segmentation	Cityscapes	mIoU32.4	668
Semantic segmentation	ACDC (test)	mIoU36.75	103
Semantic segmentation	ACDC (Night)	mIoU21.38	55
Semantic segmentation	ACDC (Rain)	mIoU38.7	48
Semantic segmentation	ACDC Snow	mIoU41.09	43
Semantic segmentation	GTA5	mIoU38.73	35
Semantic segmentation	ACDC Snow (test)	mIoU41	24
Affective Image Stylization	EmoEdit (inference)	CLIP Score0.709	11
Affective Image Filter	AIF	SSIM52.49	11
Text-driven Style Transfer	Custom Stylized Images 10 text conditions (test)	CLIP Score0.2515	7

Showing 10 of 23 rows

Other info

Code

Follow for update

@wizwand_team Discord