Future Video Synthesis with Object Motion Prediction

About

We present an approach to predict future video frames given a sequence of continuous video frames in the past. Instead of synthesizing images directly, our approach is designed to understand the complex scene dynamics by decoupling the background scene and moving objects. The appearance of the scene components in the future is predicted by non-rigid deformation of the background and affine transformation of moving objects. The anticipated appearances are combined to create a reasonable video in the future. With this procedure, our method exhibits much less tearing or distortion artifact compared to other approaches. Experimental results on the Cityscapes and KITTI datasets show that our model outperforms the state-of-the-art in terms of visual quality and accuracy.

Yue Wu, Rongrong Gao, Jaesik Park, Qifeng Chen• 2020

Related benchmarks

Task	Dataset	Result
Future video prediction	Cityscapes Next 5 frames	MS-SSIM0.757	13
Future video prediction	Cityscapes Next 10 frames	LPIPS0.2328	13
Future video prediction	Cityscapes Next frame	MS-SSIM0.891	13
Future video prediction	KITTI Next 3 frames	LPIPS0.246	11
Video Prediction	Cityscapes 9 (test)	MS-SSIM (t+1)89.1	11
Video Prediction	Cityscapes	MS-SSIM (t+1)89.1	11
Video Prediction	KITTI 12 (test)	MS-SSIM (t+1)79.28	9
Video Prediction	KITTI	MS-SSIM (t+1)79.28	9
Video Prediction	Cityscapes (test)	MS-SSIM (t+1)89.1	7
Future video prediction	KITTI Next frame	MS-SSIM0.7928	6

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord