ManiGAN: Text-Guided Image Manipulation

About

The goal of our paper is to semantically edit parts of an image matching a given text that describes desired attributes (e.g., texture, colour, and background), while preserving other contents that are irrelevant to the text. To achieve this, we propose a novel generative adversarial network (ManiGAN), which contains two key components: text-image affine combination module (ACM) and detail correction module (DCM). The ACM selects image regions relevant to the given text and then correlates the regions with corresponding semantic words for effective manipulation. Meanwhile, it encodes original image features to help reconstruct text-irrelevant contents. The DCM rectifies mismatched attributes and completes missing contents of the synthetic image. Finally, we suggest a new metric for evaluating image manipulation results, in terms of both the generation of new attributes and the reconstruction of text-irrelevant contents. Extensive experiments on the CUB and COCO datasets demonstrate the superior performance of the proposed method. Code is available at https://github.com/mrlibw/ManiGAN.

Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, Philip H. S. Torr• 2019

Related benchmarks

Task	Dataset	Result
Affective Image Filter	AIF	SSIM50.72	11
Semantic Image Translation	ImageNet (test)	LPIPS21.7	6
Affective Image Filtering	User Study (test)	EPS (%)7.63	6
Text-Guided Image Manipulation	Multi-modal CelebA-HQ	FID117.9	5
Text-Guided Image Manipulation	Multi-Modal CelebA-HQ Non-CelebA	FID143.4	5
Text-Guided Image Manipulation	CUB (test)	CLIP Score21.3	3
Text-Guided Image Manipulation	Oxford (test)	CLIP Score21.59	3
Text-Guided Image Manipulation	Multi-Modal CelebA-HQ Open-Text	FID141.5	3
Text-Guided Image Manipulation	COCO (test)	CLIP Score11.91	3
Text-driven Image Editing	COCO (random edits)	IS14.96	2

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord