SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control
About
The narrative quality of a video fundamentally determines its perceptual value. Although existing video generation methods can produce visually appealing content, they predominantly rely on sparse conditioning signals such as text prompts or first/last frames, which limits precise control over narrative structure and temporal pacing. In this paper, we propose SmartDirector, a framework that enhances the narrative capacity of video generation models through multiple keyframes. SmartDirector supports flexible generation scenarios including single-shot generation, multi-shot narrative synthesis, and video extension. The framework operates in two stages: Director-Gen generates a low-resolution video conditioned on the provided keyframes, and Director-SR refines the output by exploiting high-resolution keyframes as semantic anchors to recover fine-grained details. To enable robust multi-keyframe training, we construct a data pipeline that curates single-shot and multi-shot sequences from movies. Extensive experiments demonstrate that SmartDirector substantially outperforms existing state-of-the-art approaches. We will release the code to facilitate further research.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Super-Resolution | UDM10 | PSNR22.78 | 88 | |
| Video Super-Resolution | SPMCS | PSNR21.01 | 61 | |
| Video Super-Resolution | RealVSR | PSNR18.44 | 28 | |
| Video Super-Resolution | YouHQ40 | PSNR22.12 | 10 | |
| Video Generation | SmartDirector Custom Benchmark Single-Shot 1.0 | FVD41.12 | 2 | |
| Video Generation | SmartDirector Custom Benchmark Multi-Shot 1.0 | FVD65.65 | 2 |