Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Transformation-based Adversarial Video Prediction on Large-Scale Data

About

Recent breakthroughs in adversarial generative modeling have led to models capable of producing video samples of high quality, even on large and complex datasets of real-world video. In this work, we focus on the task of video prediction, where given a sequence of frames extracted from a video, the goal is to generate a plausible future sequence. We first improve the state of the art by performing a systematic empirical study of discriminator decompositions and proposing an architecture that yields faster convergence and higher performance than previous approaches. We then analyze recurrent units in the generator, and propose a novel recurrent unit which transforms its past hidden state according to predicted motion-like features, and refines it to handle dis-occlusions, scene changes and other complex behavior. We show that this recurrent unit consistently outperforms previous designs. Our final model leads to a leap in the state-of-the-art performance, obtaining a test set Frechet Video Distance of 25.7, down from 69.2, on the large-scale Kinetics-600 dataset.

Pauline Luc, Aidan Clark, Sander Dieleman, Diego de Las Casas, Yotam Doron, Albin Cassirer, Karen Simonyan• 2020

Related benchmarks

TaskDatasetResultRank
Video PredictionBAIR (test)
FVD103.3
59
Video PredictionKinetics-600 (test)
FVD25.7
46
Video Frame PredictionKinetics-600
gFVD25.7
38
Video PredictionBAIR Robot Pushing
FVD103
38
Video PredictionBair
FVD103.3
34
Video PredictionBAIR 64x64 (test)
FVD103.3
27
Video GenerationKinetics-600
FVD25.74
22
Video GenerationBair
FVD Score103
22
Video PredictionKinetics-600
FVD25.7
18
Future video predictionBAIR 64x64 and 256x256 (test)
FVD103
16
Showing 10 of 15 rows

Other info

Follow for update