Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Video Pixel Networks

About

We propose a probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video. The model and the neural architecture reflect the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain. The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over the previous state of the art, and the generated videos show only minor deviations from the ground truth. The VPN also produces detailed samples on the action-conditional Robotic Pushing benchmark and generalizes to the motion of novel objects.

Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu• 2016

Related benchmarks

TaskDatasetResultRank
Video PredictionKTH 10 -> 20 steps (test)
PSNR23.76
88
Video PredictionMoving MNIST (test)
MSE64.1
82
Video PredictionMoving MNIST
SSIM0.87
52
Video PredictionMoving-MNIST 10 → 10 (test)
MSE64.1
39
Traffic Flow PredictionTaxiBJ--
13
Spatiotemporal Predictive LearningMoving MNIST 10 time steps 2-digit (test)
SSIM87
11
Spatiotemporal Predictive LearningMoving MNIST 10 time steps 3-digit (test)
SSIM0.734
11
Video PredictionMoving-MNIST 10 → 30 (test)
MSE129.6
8
Showing 8 of 8 rows

Other info

Follow for update