Deep Image Spatial Transformation for Person Image Generation

About

Pose-guided person image generation is to transform a source person image to a target pose. This task requires spatial manipulations of source data. However, Convolutional Neural Networks are limited by the lack of ability to spatially transform the inputs. In this paper, we propose a differentiable global-flow local-attention framework to reassemble the inputs at the feature level. Specifically, our model first calculates the global correlations between sources and targets to predict flow fields. Then, the flowed local patch pairs are extracted from the feature maps to calculate the local attention coefficients. Finally, we warp the source features using a content-aware sampling method with the obtained local attention coefficients. The results of both subjective and objective experiments demonstrate the superiority of our model. Besides, additional results in video animation and view synthesis show that our model is applicable to other tasks requiring spatial transformation. Our source code is available at https://github.com/RenYurui/Global-Flow-Local-Attention.

Yurui Ren, Xiaoming Yu, Junming Chen, Thomas H. Li, Ge Li• 2020

Related benchmarks

Task	Dataset	Result
Human Pose Transfer	DeepFashion In-shop Clothes Retrieval (test)	SSIM0.79	14
Person Image Generation	DeepFashion	FID14.061	11
Person Image Synthesis	DeepFashion 256 x 176 (test)	FID10.573	9
Pose Transfer	DeepFashion (test)	User Preference Score47.73	9
Pose Transfer	DeepFashion 256x256 (test)	FID10.57	7
Human Pose Transfer	DeepFashion (test)	R2G19.53	7
Human Pose Transfer	Market-1501 (test)	SSIM0.281	7
Person Image Synthesis	Market-1501 128 x 64 (test)	FID19.751	5

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord