Lumiere: A Space-Time Diffusion Model for Video Generation

About

We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution -- an approach that inherently makes global temporal consistency difficult to achieve. By deploying both spatial and (importantly) temporal down- and up-sampling and leveraging a pre-trained text-to-image diffusion model, our model learns to directly generate a full-frame-rate, low-resolution video by processing it in multiple space-time scales. We demonstrate state-of-the-art text-to-video generation results, and show that our design easily facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video inpainting, and stylized generation.

Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Guanghui Liu, Amit Raj, Yuanzhen Li, Michael Rubinstein, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri• 2024

Related benchmarks

Task	Dataset	Result
Video Generation	Physics-IQ	Phys. IQ Score23	63
Text-to-Video Generation	UCF-101 (test)	FVD332.5	25
Background layer reconstruction	Synthetic Movie scenes OmnimatteRF benchmark (test)	PSNR29.04	13
Video Generation	VideoPhy Solid-Solid	SA and PC Score8.4	11
Video Generation	VideoPhy Fluid-Fluid	SA and PC Score9.1	11
Video Generation	VideoPhy Overall	SA and PC Score9	11
Video Generation	VideoPhy Solid-Fluid	SA and PC Score9.6	11
Physical Reasoning	Physics-IQ Single Frame	Physics-IQ Score19	7
Background layer reconstruction	Synthetic Kubric scenes OmnimatteRF (test)	PSNR31.46	6
Physical Plausibility Evaluation	Physics-IQ (modified)	Solid Mechanics Score27.3	6

Showing 10 of 11 rows

Other info

Code

Follow for update

@wizwand_team Discord