Semantically Multi-modal Image Synthesis

About

In this paper, we focus on semantically multi-modal image synthesis (SMIS) task, namely, generating multi-modal images at the semantic level. Previous work seeks to use multiple class-specific generators, constraining its usage in datasets with a small number of classes. We instead propose a novel Group Decreasing Network (GroupDNet) that leverages group convolutions in the generator and progressively decreases the group numbers of the convolutions in the decoder. Consequently, GroupDNet is armed with much more controllability on translating semantic labels to natural images and has plausible high-quality yields for datasets with many classes. Experiments on several challenging datasets demonstrate the superiority of GroupDNet on performing the SMIS task. We also show that GroupDNet is capable of performing a wide range of interesting synthesis applications. Codes and models are available at: https://github.com/Seanseattle/SMIS.

Zhen Zhu, Zhiliang Xu, Ansheng You, Xiang Bai• 2020

Related benchmarks

Task	Dataset	Result
Semantic segmentation	Cityscapes (test)	mIoU93.1	1252
Semantic Image Synthesis	ADE20K	FID39.11	66
Semantic Image Synthesis	Cityscapes	FID41.12	54
Semantic Image Synthesis	Cityscapes (test)	LPIPS0.546	48
Depth Estimation	Cityscapes (test)	--	40
Semantic Image Synthesis	CelebAMask-HQ	FID25.9	33
Image-to-Image Translation	CelebA-HQ	FID23.71	32
Image-to-Image Translation	DeepFashion (val)	FID22.23	9
Exemplar-based image translation	DeepFashion	FID22.23	9
Image-to-Image Translation	ADE20K (train val)	FID42.17	9

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord