Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Decomposing Motion and Content for Natural Video Sequence Prediction

About

We propose a deep neural network for the prediction of future frames in natural video sequences. To effectively handle complex evolution of pixels in videos, we propose to decompose the motion and content, two key components generating dynamics in videos. Our model is built upon the Encoder-Decoder Convolutional Neural Network and Convolutional LSTM for pixel-level prediction, which independently capture the spatial layout of an image and the corresponding temporal dynamics. By independently modeling motion and content, predicting the next frame reduces to converting the extracted content features into the next frame content by the identified motion features, which simplifies the task of prediction. Our model is end-to-end trainable over multiple time steps, and naturally learns to decompose motion and content without separate training. We evaluate the proposed network architecture on human activity videos using KTH, Weizmann action, and UCF-101 datasets. We show state-of-the-art performance in comparison to recent approaches. To the best of our knowledge, this is the first end-to-end trainable network architecture with motion and content separation to model the spatiotemporal dynamics for pixel-level future prediction in natural videos.

Ruben Villegas, Jimei Yang, Seunghoon Hong, Xunyu Lin, Honglak Lee• 2017

Related benchmarks

TaskDatasetResultRank
Video PredictionKTH 10 -> 20 steps (test)
PSNR25.95
88
Video PredictionMoving MNIST (test)
MSE50.1
82
Video PredictionKTH 10 -> 40 steps (test)
PSNR23.89
77
Video PredictionCaltech Pedestrian 10 -> 1 (test)
SSIM0.879
31
Next-frame predictionCalTech Pedestrian transfer from KITTI (test)
SSIM87.9
29
Future video predictionCityscapes Next frame
MS-SSIM0.897
13
Future video predictionCityscapes Next 10 frames
LPIPS0.451
13
Future video predictionCityscapes Next 5 frames
MS-SSIM0.706
13
Video PredictionMMNIST
MSE0.0425
12
Future video predictionKITTI Next 3 frames
LPIPS0.317
11
Showing 10 of 32 rows

Other info

Follow for update