Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Video Probabilistic Diffusion Models in Projected Latent Space

About

Despite the remarkable progress in deep generative models, synthesizing high-resolution and temporally coherent videos still remains a challenge due to their high-dimensionality and complex temporal dynamics along with large spatial variations. Recent works on diffusion models have shown their potential to solve this challenge, yet they suffer from severe computation- and memory-inefficiency that limit the scalability. To handle this issue, we propose a novel generative model for videos, coined projected latent video diffusion models (PVDM), a probabilistic diffusion model which learns a video distribution in a low-dimensional latent space and thus can be efficiently trained with high-resolution videos under limited resources. Specifically, PVDM is composed of two components: (a) an autoencoder that projects a given video as 2D-shaped latent vectors that factorize the complex cubic structure of video pixels and (b) a diffusion model architecture specialized for our new factorized latent space and the training/sampling procedure to synthesize videos of arbitrary length with a single model. Experiments on popular video generation datasets demonstrate the superiority of PVDM compared with previous video synthesis methods; e.g., PVDM obtains the FVD score of 639.7 on the UCF-101 long video (128 frames) generation benchmark, which improves 1773.4 of the prior state-of-the-art.

Sihyun Yu, Kihyuk Sohn, Subin Kim, Jinwoo Shin• 2023

Related benchmarks

TaskDatasetResultRank
Video GenerationUCF101
FVD343.6
68
Video ReconstructionUCF-101
rFVD66.5
39
Video GenerationSkyTimelapse
FVD55.4
22
Class-Conditional Video GenerationUCF-101 v1.0 (train test)
FVD343.6
21
Class-conditioned Video GenerationUCF101 (test)
Fréchet Video Distance343.6
19
Video GenerationFaceForensics
FVD355.9
15
Video GenerationSky
FVD75.5
14
Unconditional video generationUCF-101 256x256
FVD (256x256, 2048)1.14e+3
12
Video GenerationUCF
FVD1.14e+3
12
Video GenerationTaichi-HD
FVD540.2
12
Showing 10 of 13 rows

Other info

Code

Follow for update