Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer

About

Text-to-image models (T2I) such as StableDiffusion have been used to generate high quality images of people. However, due to the random nature of the generation process, the person has a different appearance e.g. pose, face, and clothing, despite using the same text prompt. The appearance inconsistency makes T2I unsuitable for pose transfer. We address this by proposing a multimodal diffusion model that accepts text, pose, and visual prompting. Our model is the first unified method to perform all person image tasks - generation, pose transfer, and mask-less edit. We also pioneer using small dimensional 3D body model parameters directly to demonstrate new capability - simultaneous pose and camera view interpolation while maintaining the person's appearance.

Soon Yau Cheong, Armin Mustafa, Andrew Gilbert• 2023

Related benchmarks

TaskDatasetResultRank
ReposingWPose (Out-of-Domain)
FID75.653
10
ReposingDeepFashion In-Domain
FID9.611
10
Pose TransferDeepFashion reduced (test)
FID7.876
7
Multi-view pose transferDeepFashion Multimodal
SSIM0.7085
5
Text-and-pose guided image generationDeepFashion Multimodal Text2Human
FID23.46
3
Text-based Human Image ManipulationWVTON (test)
FID138.2
3
Text ManipulationWVTON Full Edit (test)
Pose Accuracy13.2
2
Showing 7 of 7 rows

Other info

Follow for update