Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives

About

State-of-the-art text-to-video models excel at generating isolated clips but fall short of creating the coherent, multi-shot narratives, which are the essence of storytelling. We bridge this "narrative gap" with HoloCine, a model that generates entire scenes holistically to ensure global consistency from the first shot to the last. Our architecture achieves precise directorial control through a Window Cross-Attention mechanism that localizes text prompts to specific shots, while a Sparse Inter-Shot Self-Attention pattern (dense within shots but sparse between them) ensures the efficiency required for minute-scale generation. Beyond setting a new state-of-the-art in narrative coherence, HoloCine develops remarkable emergent abilities: a persistent memory for characters and scenes, and an intuitive grasp of cinematic techniques. Our work marks a pivotal shift from clip synthesis towards automated filmmaking, making end-to-end cinematic creation a tangible future. Our code is available at: https://holo-cine.github.io/.

Yihao Meng, Hao Ouyang, Yue Yu, Qiuyu Wang, Wen Wang, Ka Leong Cheng, Hanlin Wang, Yixuan Li, Cheng Chen, Yanhong Zeng, Yujun Shen, Huamin Qu• 2025

Related benchmarks

TaskDatasetResultRank
Video Generation Quality EvaluationEvalVerse
Machine Win Ratio81
172
Text-to-VideoShotVerse-Bench
Motion Type Appropriateness4.324
12
Multi-shot Video GenerationShotVerse-Bench
Semantic Consistency (Global)0.297
7
Multi-shot video storytellingST-Bench
Aesthetic Quality56.53
5
Intra-shot prompt-following alignmentPillar 2 Intra-shot prompt-following alignment
Intra-shot Character Presence88.2
4
Cross-shot consistencyPillar 3 Cross-shot consistency
CS Consistency (Face)75.1
4
Intra-shot quality evaluationEntityBench
Subject Consistency86
4
Video GenerationEntityBench Cross-shot 1.0
Cross-shot Face Consistency75.1
4
Video GenerationEntityBench Intra-shot 1.0
Imaging Quality49.97
4
Multi-shot Video GenerationMulti-shot Video Benchmark 15s
Aesthetic Score0.5842
3
Showing 10 of 10 rows

Other info

Follow for update