SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
About
Recently video generation has achieved substantial progress with realistic results. Nevertheless, existing AI-generated videos are usually very short clips ("shot-level") depicting a single scene. To deliver a coherent long video ("story-level"), it is desirable to have creative transition and prediction effects across different clips. This paper presents a short-to-long video diffusion model, SEINE, that focuses on generative transition and prediction. The goal is to generate high-quality long videos with smooth and creative transitions between scenes and varying lengths of shot-level videos. Specifically, we propose a random-mask video diffusion model to automatically generate transitions based on textual descriptions. By providing the images of different scenes as inputs, combined with text-based control, our model generates transition videos that ensure coherence and visual quality. Furthermore, the model can be readily extended to various tasks such as image-to-video animation and autoregressive video prediction. To conduct a comprehensive evaluation of this new generative task, we propose three assessing criteria for smooth and creative transition: temporal consistency, semantic similarity, and video-text semantic alignment. Extensive experiments validate the effectiveness of our approach over existing methods for generative transition and prediction, enabling the creation of story-level long videos. Project page: https://vchitect.github.io/SEINE-project/ .
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Generation | VBench | Quality Score70.97 | 102 | |
| Video Generation | Physics-IQ | Phys. IQ Score29.13 | 45 | |
| Image-to-Video Generation | VBench I2V 1.0 (test) | Subject Consistency96.57 | 13 | |
| Video Generation | Kinetics-600 | FVD332.8 | 4 | |
| Transition Video Generation | Webvid10M (test) | LPIPS (First Frame)0.4332 | 3 | |
| Transition Video Generation | User Study | User Preference Score11.6 | 3 |