VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation
About
Generative models that can model and predict sequences of future events can, in principle, learn to capture complex real-world phenomena, such as physical interactions. However, a central challenge in video prediction is that the future is highly uncertain: a sequence of past observations of events can imply many possible futures. Although a number of recent works have studied probabilistic models that can represent uncertain futures, such models are either extremely expensive computationally as in the case of pixel-level autoregressive models, or do not directly optimize the likelihood of the data. To our knowledge, our work is the first to propose multi-frame video prediction with normalizing flows, which allows for direct optimization of the data likelihood, and produces high-quality stochastic predictions. We describe an approach for modeling the latent space dynamics, and demonstrate that flow-based generative models offer a viable and competitive approach to generative modelling of video.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Prediction | BAIR Push (test) | FVD95 | 30 | |
| Future video prediction | BAIR 64x64 and 256x256 (test) | FVD131 | 16 | |
| Video Prediction | BAIR 64x64 | FVD131 | 14 | |
| Video modeling | BAIR Robot Pushing (test) | -- | 14 | |
| Video Generation | Bair | FVD Score124.8 | 7 | |
| Video Generation | Stochastic Movement Dataset (test) | Fooling Rate31.8 | 3 | |
| Video modeling | BAIR Robotic Pushing | Bits/Dim1.87 | 3 | |
| Video Generation | BAIR action-free (test) | Bits-per-pixel1.87 | 1 |