VIDM: Video Implicit Diffusion Models

About

Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images. In this paper, we propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition manner, i.e. one can sample plausible video motions according to the latent feature of frames. We improve the quality of the generated videos by proposing multiple strategies such as sampling space truncation, robustness penalty, and positional group normalization. Various experiments are conducted on datasets consisting of videos with different resolutions and different number of frames. Results show that the proposed method outperforms the state-of-the-art generative adversarial network-based methods by a significant margin in terms of FVD scores as well as perceptible visual quality.

Kangfu Mei, Vishal M. Patel• 2022

Related benchmarks

Task	Dataset	Result
Video Generation	UCF-101 (test)	Inception Score64.17	105
Video Generation	UCF101	FVD263	68
Video Prediction	BAIR 64x64 (test)	FVD131.7	27
Video Generation	SkyTimelapse	FVD57.4	22
Class-Conditional Video Generation	UCF-101 v1.0 (train test)	FVD294.7	21
Video Generation	Video Generation	Sampling Time (s)192	21
Class-conditioned Video Generation	UCF101 (test)	Fréchet Video Distance294.7	19
Video Generation	UCF101 128x128 16 frames	Inception Score64.17	17
Video Generation	SkyTimelapse 256x256 (test)	FVD57.4	14
Video Generation	TaiChi-HD 128x128 (test)	FVD121.9	14

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord