Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

A Variational U-Net for Conditional Appearance and Shape Generation

About

Deep generative models have demonstrated great performance in image synthesis. However, results deteriorate in case of spatial deformations, since they generate images of objects directly, rather than modeling the intricate interplay of their inherent shape and appearance. We present a conditional U-Net for shape-guided image generation, conditioned on the output of a variational autoencoder for appearance. The approach is trained end-to-end on images, without requiring samples of the same object with varying pose or appearance. Experiments show that the model enables conditional image generation and transfer. Therefore, either shape or appearance can be retained from a query image, while freely altering the other. Moreover, appearance can be sampled due to its stochastic latent representation, while preserving shape. In quantitative and qualitative experiments on COCO, DeepFashion, shoes, Market-1501 and handbags, the approach demonstrates significant improvements over the state-of-the-art.

Patrick Esser, Ekaterina Sutter, Bj\"orn Ommer• 2018

Related benchmarks

TaskDatasetResultRank
Person Re-IdentificationMarket-1501 (train)
Rank-1 Acc65.3
80
Person Image GenerationMarket-1501 (test)
SSIM0.266
25
Person Image GenerationDeepFashion (test)
SSIM0.763
19
Human Pose TransferDeepFashion In-shop Clothes Retrieval (test)
SSIM0.763
14
Person Image GenerationDeepFashion
FID23.583
11
Person Image SynthesisDeepFashion (test)
SSIM0.763
10
Face Reenactmentsame source
AU (%)80.2
7
Video ReenactmentTED-talks dataset (test)
IE1.19
7
Face Reenactmentcross source
AU (%)79.4
7
Face Reenactmentin the wild
AU %79.6
7
Showing 10 of 15 rows

Other info

Follow for update