Stochastic Latent Residual Video Prediction

About

Designing video prediction models that account for the inherent uncertainty of the future is challenging. Most works in the literature are based on stochastic image-autoregressive recurrent networks, which raises several performance and applicability issues. An alternative is to use fully latent temporal models which untie frame synthesis and temporal dynamics. However, no such model for stochastic video prediction has been proposed in the literature yet, due to design and training difficulties. In this paper, we overcome these difficulties by introducing a novel stochastic temporal model whose dynamics are governed in a latent space by a residual update rule. This first-order scheme is motivated by discretization schemes of differential equations. It naturally models video dynamics as it allows our simpler, more interpretable, latent model to outperform prior state-of-the-art methods on challenging datasets.

Jean-Yves Franceschi, Edouard Delasalles, Micka\"el Chen, Sylvain Lamprier, Patrick Gallinari• 2020

Related benchmarks

Task	Dataset	Result
Video Prediction	BAIR (test)	FVD162	59
Video Prediction	KTH	PSNR29.69	35
Video Prediction	BAIR Push (test)	FVD141.7	30
Video Prediction	KTH (test)	FVD222	24
Future video prediction	BAIR 64x64 and 256x256 (test)	FVD181	16
Video Prediction	BAIR 64x64	FVD181	14
Video Synthesis	iPER (test)	FVD245.1	11
Video Prediction	Moving MNIST two-digits (test)	PSNR18.25	9
Video Prediction	Human3.6M (test)	FVD174.7	9
Proxy-supervised Video Generation	BAIR 64x64 Full (test)	LPIPS0.491	6

Showing 10 of 16 rows

Other info

Code

Follow for update

@wizwand_team Discord