Future Video Synthesis with Object Motion Prediction
About
We present an approach to predict future video frames given a sequence of continuous video frames in the past. Instead of synthesizing images directly, our approach is designed to understand the complex scene dynamics by decoupling the background scene and moving objects. The appearance of the scene components in the future is predicted by non-rigid deformation of the background and affine transformation of moving objects. The anticipated appearances are combined to create a reasonable video in the future. With this procedure, our method exhibits much less tearing or distortion artifact compared to other approaches. Experimental results on the Cityscapes and KITTI datasets show that our model outperforms the state-of-the-art in terms of visual quality and accuracy.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Future video prediction | Cityscapes Next 5 frames | MS-SSIM0.757 | 13 | |
| Future video prediction | Cityscapes Next 10 frames | LPIPS0.2328 | 13 | |
| Future video prediction | Cityscapes Next frame | MS-SSIM0.891 | 13 | |
| Future video prediction | KITTI Next 3 frames | LPIPS0.246 | 11 | |
| Video Prediction | Cityscapes 9 (test) | MS-SSIM (t+1)89.1 | 11 | |
| Video Prediction | Cityscapes | MS-SSIM (t+1)89.1 | 11 | |
| Video Prediction | KITTI 12 (test) | MS-SSIM (t+1)79.28 | 9 | |
| Video Prediction | KITTI | MS-SSIM (t+1)79.28 | 9 | |
| Video Prediction | Cityscapes (test) | MS-SSIM (t+1)89.1 | 7 | |
| Future video prediction | KITTI Next frame | MS-SSIM0.7928 | 6 |