ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

About

Generating high-fidelity human video with specified identities has attracted significant attention in the content generation community. However, existing techniques struggle to strike a balance between training efficiency and identity preservation, either requiring tedious case-by-case fine-tuning or usually missing identity details in the video generation process. In this study, we present \textbf{ID-Animator}, a zero-shot human-video generation approach that can perform personalized video generation given a single reference facial image without further training. ID-Animator inherits existing diffusion-based video generation backbones with a face adapter to encode the ID-relevant embeddings from learnable facial latent queries. To facilitate the extraction of identity information in video generation, we introduce an ID-oriented dataset construction pipeline that incorporates unified human attributes and action captioning techniques from a constructed facial image pool. Based on this pipeline, a random reference training strategy is further devised to precisely capture the ID-relevant embeddings with an ID-preserving loss, thus improving the fidelity and generalization capacity of our model for ID-specific video generation. Extensive experiments demonstrate the superiority of ID-Animator to generate personalized human videos over previous models. Moreover, our method is highly compatible with popular pre-trained T2V models like animatediff and various community backbone models, showing high extendability in real-world applications for video generation where identity preservation is highly desired. Our codes and checkpoints are released at https://github.com/ID-Animator/ID-Animator.

Xuanhua He, Quande Liu, Shengju Qian, Xin Wang, Tao Hu, Ke Cao, Keyu Yan, Jie Zhang• 2024

Related benchmarks

Task	Dataset	Result
Identity-Preserving Video Generation	OpenS2V (test)	Face Similarity0.316	17
Subject-Preserving Video Generation	OpenS2V-Eval Human-Domain	Total Score43.37	17
Video Customization	70-example benchmark 1.0 (test)	FaceSim Arc0.31	9
Identity-Preserving Text-to-Video generation	IPT2V (test)	FaceSim-Arc0.31	8
Identity-consistent video generation	User Study 15 identities	Face Similarity Score2.447	8
Video Customization	DreamBooth Custom	MS Score99.3	7
Video Customization	OpenCustom	MS Score99.14	7
Video Line Art Colorization	Anime Video Clips 200 clips (test)	PSNR15.61	5
Identity-Preserving Text-to-Video	IPT2V Evaluation (test)	FaceSim Arc0.32	2

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord