Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DRIT++: Diverse Image-to-Image Translation via Disentangled Representations

About

Image-to-image translation aims to learn the mapping between two visual domains. There are two main challenges for this task: 1) lack of aligned training pairs and 2) multiple possible outputs from a single input image. In this work, we present an approach based on disentangled representation for generating diverse outputs without paired training images. To synthesize diverse outputs, we propose to embed images onto two spaces: a domain-invariant content space capturing shared information across domains and a domain-specific attribute space. Our model takes the encoded content features extracted from a given input and attribute vectors sampled from the attribute space to synthesize diverse outputs at test time. To handle unpaired training data, we introduce a cross-cycle consistency loss based on disentangled representations. Qualitative results show that our model can generate diverse and realistic images on a wide range of tasks without paired training data. For quantitative evaluations, we measure realism with user study and Fr\'{e}chet inception distance, and measure diversity with the perceptual distance metric, Jensen-Shannon divergence, and number of statistically-different bins.

Hsin-Ying Lee, Hung-Yu Tseng, Qi Mao, Jia-Bin Huang, Yu-Ding Lu, Maneesh Singh, Ming-Hsuan Yang• 2019

Related benchmarks

TaskDatasetResultRank
Surface Normal EstimationBedroom Images In-domain
L1 Error0.0784
11
Image-to-Image Translationedges2shoes
FID53.373
11
Image-to-Image TranslationGTA to Cityscapes (test)
SSIM0.14
10
Image-to-Image TranslationGTA to KITTI (test)
SSIM0.08
9
Monocular Depth EstimationGeneralization Images Out-of-domain
Relative Error (REL)0.4374
8
Surface Normal EstimationGeneralization Images Out-of-domain
L1 Error0.135
8
Intrinsic Image DecompositionBedroom Images In-domain
Albedo MSE0.0296
8
Intrinsic Image DecompositionBedroom images Out-of-domain
Albedo MSE0.0392
8
Monocular Depth EstimationBedroom Images In-domain
REL37.92
8
Unpaired Image-to-Image TranslationCityscapes
Pixel Accuracy60.3
8
Showing 10 of 22 rows

Other info

Follow for update