Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks

About

Predicting future video frames is extremely challenging, as there are many factors of variation that make up the dynamics of how frames change through time. Previously proposed solutions require complex inductive biases inside network architectures with highly specialized computation, including segmentation masks, optical flow, and foreground and background separation. In this work, we question if such handcrafted architectures are necessary and instead propose a different approach: finding minimal inductive bias for video prediction while maximizing network capacity. We investigate this question by performing the first large-scale empirical study and demonstrate state-of-the-art performance by learning large models on three different datasets: one for modeling object interactions, one for modeling human motion, and one for modeling car driving.

Ruben Villegas, Arkanath Pathak, Harini Kannan, Dumitru Erhan, Quoc V. Le, Honglak Lee• 2019

Related benchmarks

TaskDatasetResultRank
Video PredictionHuman3.6M 4 frames -> 4 frames
PSNR32.11
20
Blue buttonVP2 benchmark
Mean Success Rate97.33
7
Open slideVP2 benchmark
Mean Success Rate57.33
7
Red buttonVP2 benchmark
Mean Success Rate76
7
Robosuite pushVP2 benchmark
Mean Success Rate79.8
7
open drawerVP2 benchmark
Mean Success Rate16.67
7
Video PredictionRoboNet
FVD123.2
7
Green buttonVP2 benchmark
Mean Success Rate81.33
7
Upright blockVP2
Mean Success Rate48.67
7
Video PredictionRoboNet (test)
FVD123.2
7
Showing 10 of 11 rows

Other info

Follow for update