Video Pixel Networks
About
We propose a probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video. The model and the neural architecture reflect the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain. The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over the previous state of the art, and the generated videos show only minor deviations from the ground truth. The VPN also produces detailed samples on the action-conditional Robotic Pushing benchmark and generalizes to the motion of novel objects.
Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu• 2016
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Prediction | KTH 10 -> 20 steps (test) | PSNR23.76 | 88 | |
| Video Prediction | Moving MNIST (test) | MSE64.1 | 82 | |
| Video Prediction | Moving MNIST | SSIM0.87 | 52 | |
| Video Prediction | Moving-MNIST 10 → 10 (test) | MSE64.1 | 39 | |
| Traffic Flow Prediction | TaxiBJ | -- | 13 | |
| Spatiotemporal Predictive Learning | Moving MNIST 10 time steps 2-digit (test) | SSIM87 | 11 | |
| Spatiotemporal Predictive Learning | Moving MNIST 10 time steps 3-digit (test) | SSIM0.734 | 11 | |
| Video Prediction | Moving-MNIST 10 → 30 (test) | MSE129.6 | 8 |
Showing 8 of 8 rows