ID-Animator: Zero-Shot Identity-Preserving Human Video Generation
About
Generating high-fidelity human video with specified identities has attracted significant attention in the content generation community. However, existing techniques struggle to strike a balance between training efficiency and identity preservation, either requiring tedious case-by-case fine-tuning or usually missing identity details in the video generation process. In this study, we present \textbf{ID-Animator}, a zero-shot human-video generation approach that can perform personalized video generation given a single reference facial image without further training. ID-Animator inherits existing diffusion-based video generation backbones with a face adapter to encode the ID-relevant embeddings from learnable facial latent queries. To facilitate the extraction of identity information in video generation, we introduce an ID-oriented dataset construction pipeline that incorporates unified human attributes and action captioning techniques from a constructed facial image pool. Based on this pipeline, a random reference training strategy is further devised to precisely capture the ID-relevant embeddings with an ID-preserving loss, thus improving the fidelity and generalization capacity of our model for ID-specific video generation. Extensive experiments demonstrate the superiority of ID-Animator to generate personalized human videos over previous models. Moreover, our method is highly compatible with popular pre-trained T2V models like animatediff and various community backbone models, showing high extendability in real-world applications for video generation where identity preservation is highly desired. Our codes and checkpoints are released at https://github.com/ID-Animator/ID-Animator.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Identity-Preserving Video Generation | OpenS2V (test) | Face Similarity0.316 | 17 | |
| Video Customization | 70-example benchmark 1.0 (test) | FaceSim Arc0.31 | 9 | |
| Identity-consistent video generation | User Study 15 identities | Face Similarity Score2.447 | 8 | |
| Video Line Art Colorization | Anime Video Clips 200 clips (test) | PSNR15.61 | 5 | |
| Identity-Preserving Text-to-Video | IPT2V Evaluation (test) | FaceSim Arc0.32 | 2 |