IdentityStory: Taming Your Identity-Preserving Generator for Human-Centric Story Generation
About
Recent visual generative models enable story generation with consistent characters from text, but human-centric story generation faces additional challenges, such as maintaining detailed and diverse human face consistency and coordinating multiple characters across different images. This paper presents IdentityStory, a framework for human-centric story generation that ensures consistent character identity across multiple sequential images. By taming identity-preserving generators, the framework features two key components: Iterative Identity Discovery, which extracts cohesive character identities, and Re-denoising Identity Injection, which re-denoises images to inject identities while preserving desired context. Experiments on the ConsiStory-Human benchmark demonstrate that IdentityStory outperforms existing methods, particularly in face consistency, and supports multi-character combinations. The framework also shows strong potential for applications such as infinite-length story generation and dynamic character composition.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Story Generation | Story Generation Evaluation Set | Text Alignment84.41 | 5 | |
| Story-consistent Image Generation | User Study 20 story-based scenarios | Text Alignment (%)69.2 | 5 | |
| Story Generation | ConsiStory-Human 1.0 (test) | CLIP-T Score35.4 | 5 |