HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives

About

State-of-the-art text-to-video models excel at generating isolated clips but fall short of creating the coherent, multi-shot narratives, which are the essence of storytelling. We bridge this "narrative gap" with HoloCine, a model that generates entire scenes holistically to ensure global consistency from the first shot to the last. Our architecture achieves precise directorial control through a Window Cross-Attention mechanism that localizes text prompts to specific shots, while a Sparse Inter-Shot Self-Attention pattern (dense within shots but sparse between them) ensures the efficiency required for minute-scale generation. Beyond setting a new state-of-the-art in narrative coherence, HoloCine develops remarkable emergent abilities: a persistent memory for characters and scenes, and an intuitive grasp of cinematic techniques. Our work marks a pivotal shift from clip synthesis towards automated filmmaking, making end-to-end cinematic creation a tangible future. Our code is available at: https://holo-cine.github.io/.

Yihao Meng, Hao Ouyang, Yue Yu, Qiuyu Wang, Wen Wang, Ka Leong Cheng, Hanlin Wang, Yixuan Li, Cheng Chen, Yanhong Zeng, Yujun Shen, Huamin Qu• 2025

Related benchmarks

Task	Dataset	Result
Video Generation Quality Evaluation	EvalVerse	Machine Win Ratio81	172
Text-to-Video	ShotVerse-Bench	Motion Type Appropriateness4.324	12
Multi-shot Video Generation	ShotVerse-Bench	Semantic Consistency (Global)0.297	7
Multi-shot video storytelling	ST-Bench	Aesthetic Quality56.53	5
Intra-shot prompt-following alignment	Pillar 2 Intra-shot prompt-following alignment	Intra-shot Character Presence88.2	4
Cross-shot consistency	Pillar 3 Cross-shot consistency	CS Consistency (Face)75.1	4
Intra-shot quality evaluation	EntityBench	Subject Consistency86	4
Video Generation	EntityBench Cross-shot 1.0	Cross-shot Face Consistency75.1	4
Video Generation	EntityBench Intra-shot 1.0	Imaging Quality49.97	4
Multi-shot Video Generation	Multi-shot Video Benchmark 15s	Aesthetic Score0.5842	3

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord