Fusion Embedding for Pose-Guided Person Image Synthesis with Diffusion Model
About
Pose-Guided Person Image Synthesis (PGPIS) aims to generate human images in specified poses while preserving the identity and appearance of a source image. This technology facilitates diverse applications, including virtual try-on, digital avatars, animation, and sign language generation. Despite the high-quality results of recent diffusion-based PGPIS, these models typically depend on implicit feature aggregation within the denoising process. As a result, fine-grained texture preservation is limited, and even for the same identity, it is difficult to ensure consistent generation under variations in pose and source appearance. To address these limitations, we propose Fusion Embedding for PGPIS using a Diffusion Model (FPDM), the first framework that explicitly aligns fused source-pose embeddings with target image embeddings via contrastive learning, and subsequently employs the learned fusion embedding as a conditioning signal for generation. FPDM integrates an Image-Pose Fusion (IPF) module into our proposed Source-Enhanced Pose Fusion approach to learn a fusion embedding aligned with the target image. We then employ a conditional diffusion model guided by source appearance, target pose, and the learned fusion embedding. Experiments on the DeepFashion benchmark and the RWTH-PHOENIX-Weather 2014T dataset demonstrate competitive performance compared to existing methods in both quantitative and qualitative evaluations, with ablation studies confirming that explicit fusion embedding alignment substantially improves texture fidelity and consistency across pose and source appearance variations.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Pose-guided person image synthesis | DeepFashion 256 × 176 resolution (test) | FID7.318 | 13 | |
| Sign Language Video Generation | RWTH-PHOENIX-Weather 2014T (test) | SSIM86.4 | 10 | |
| Pose-guided person image synthesis | DeepFashion 512 × 352 resolution (test) | FID7.534 | 9 |