Multimodal Conditional Image Synthesis with Product-of-Experts GANs

About

Existing conditional image synthesis frameworks generate images based on user inputs in a single modality, such as text, segmentation, sketch, or style reference. They are often unable to leverage multimodal user inputs when available, which reduces their practicality. To address this limitation, we propose the Product-of-Experts Generative Adversarial Networks (PoE-GAN) framework, which can synthesize images conditioned on multiple input modalities or any subset of them, even the empty set. PoE-GAN consists of a product-of-experts generator and a multimodal multiscale projection discriminator. Through our carefully designed training scheme, PoE-GAN learns to synthesize images with high quality and diversity. Besides advancing the state of the art in multimodal conditional image synthesis, PoE-GAN also outperforms the best existing unimodal conditional image synthesis approaches when tested in the unimodal setting. The project website is available at https://deepimagination.github.io/PoE-GAN .

Xun Huang, Arun Mallya, Ting-Chun Wang, Ming-Yu Liu• 2021

Related benchmarks

Task	Dataset	Result
Semantic Image Synthesis	COCO Stuff (val)	FID15.8	42
Layout-to-Image Synthesis	Coco-Stuff (test)	FID15.8	25
Text-to-Image Synthesis	MM-CelebA-HQ 256x256	FID13.71	7
Text-to-Image Synthesis	MS-COCO 2017 (test)	FID20.5	7
Segmentation-to-Image Synthesis	MS-COCO 2017 (test)	FID15.8	4
Conditional Image Synthesis	MM-CelebA-HQ 1024x1024 1.0 (test)	Score (Segmentation)9.9	4
Text-to-Image Synthesis	MS-COCO (test)	--	4
Segmentation-to-Image Synthesis	MS-COCO	Preference Rate69	3
Sketch-to-Image Synthesis	MS-COCO 2017 (test)	FID25.5	2
Unconditional image synthesis	MS-COCO 2017 (test)	FID43.4	2

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord