Video Pixel Networks

About

We propose a probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video. The model and the neural architecture reflect the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain. The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over the previous state of the art, and the generated videos show only minor deviations from the ground truth. The VPN also produces detailed samples on the action-conditional Robotic Pushing benchmark and generalizes to the motion of novel objects.

Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu• 2016

Related benchmarks

Task	Dataset	Result
Video Prediction	KTH 10 -> 20 steps (test)	PSNR23.76	102
Video Prediction	Moving MNIST	SSIM0.87	83
Video Prediction	Moving MNIST (test)	MSE64.1	82
Video Prediction	Moving-MNIST 10 → 10 (test)	MSE64.1	39
Traffic Flow Prediction	TaxiBJ	--	13
Spatiotemporal Predictive Learning	Moving MNIST 10 time steps 2-digit (test)	SSIM87	11
Spatiotemporal Predictive Learning	Moving MNIST 10 time steps 3-digit (test)	SSIM0.734	11
Video Prediction	Moving-MNIST 10 → 30 (test)	MSE129.6	8

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord