PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

About

Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts. However, existing personalized generation methods cannot simultaneously satisfy the requirements of high efficiency, promising identity (ID) fidelity, and flexible text controllability. In this work, we introduce PhotoMaker, an efficient personalized text-to-image generation method, which mainly encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information. Such an embedding, serving as a unified ID representation, can not only encapsulate the characteristics of the same input ID comprehensively, but also accommodate the characteristics of different IDs for subsequent integration. This paves the way for more intriguing and practically valuable applications. Besides, to drive the training of our PhotoMaker, we propose an ID-oriented data construction pipeline to assemble the training data. Under the nourishment of the dataset constructed through the proposed pipeline, our PhotoMaker demonstrates better ID preservation ability than test-time fine-tuning based methods, yet provides significant speed improvements, high-quality generation results, strong generalization capabilities, and a wide range of applications. Our project page is available at https://photo-maker.github.io/

Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming-Ming Cheng, Ying Shan• 2023

Related benchmarks

Task	Dataset	Result
Consistent Text-to-Image Generation	ConsiStory+ (test)	CLIP-T0.8812	23
Multi-frame visual story generation	ConsiStory+	CLIP-T86.51	12
ID-preserving generation	Web100	Facial Similarity0.598	12
ID-preserving generation	CelebA300	Facial Similarity0.551	12
Subject Personalization	CelebA-HQ	Identity Score22.71	11
Safe Generation Rate	I2P	GPT-4o Score83.22	9
Safe Generation Rate	Misbinding	GPT-4o Score0.6976	8
Prompt-image Alignment	I2P	CLIPScore0.6916	8
Prompt-image Alignment	Sneakyprompt	CLIPScore0.6504	8
Prompt-image Alignment	Misbinding	CLIPScore0.7873	8

Showing 10 of 27 rows

Other info

Code

Follow for update

@wizwand_team Discord