LatentKeypointGAN: Controlling Images via Latent Keypoints

About

Generative adversarial networks (GANs) have attained photo-realistic quality in image generation. However, how to best control the image content remains an open challenge. We introduce LatentKeypointGAN, a two-stage GAN which is trained end-to-end on the classical GAN objective with internal conditioning on a set of space keypoints. These keypoints have associated appearance embeddings that respectively control the position and style of the generated objects and their parts. A major difficulty that we address with suitable network architectures and training schemes is disentangling the image into spatial and appearance factors without domain knowledge and supervision signals. We demonstrate that LatentKeypointGAN provides an interpretable latent space that can be used to re-arrange the generated images by re-positioning and exchanging keypoint embeddings, such as generating portraits by combining the eyes, nose, and mouth from different images. In addition, the explicit generation of keypoints and matching images enables a new, GAN-based method for unsupervised keypoint detection.

Xingzhe He, Bastian Wandt, Helge Rhodin• 2021

Related benchmarks

Task	Dataset	Result
Landmark Detection	CelebA Wild (K=8) (test)	Normalized L2 Distance (%)5.63	14
Landmark Detection	CUB Category 001 2011 (test)	Normalized L2 Distance22.6	12
Landmark Detection	CUB Category 002 2011 (test)	Normalized L2 Distance29.1	12
Keypoint Detection	CUB-200-2011 all	Mean L2 Error14.7	11
Landmark Detection	CelebA Wild (K=4) (test)	Normalized L2 Distance12.1	10
Landmark Detection	CelebA Aligned (K=10) (test)	Norm L2 Dist (%)3.31	9
Landmark Detection	CUB-003	Normalized L2 Distance0.212	9
Keypoint Detection	CUB-200 Aligned 2011	Mean L2 Error5.21	9
Landmark Detection	Taichi (test)	L2 Distance437.7	8
Landmark Detection	CUB (all)	Normalized L2 Distance14.7	6

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord