Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning Semantic-Aware Dynamics for Video Prediction

About

We propose an architecture and training scheme to predict video frames by explicitly modeling dis-occlusions and capturing the evolution of semantically consistent regions in the video. The scene layout (semantic map) and motion (optical flow) are decomposed into layers, which are predicted and fused with their context to generate future layouts and motions. The appearance of the scene is warped from past frames using the predicted motion in co-visible regions; dis-occluded regions are synthesized with content-aware inpainting utilizing the predicted scene layout. The result is a predictive model that explicitly represents objects and learns their class-specific motion, which we evaluate on video prediction benchmarks.

Xinzhu Bei, Yanchao Yang, Stefano Soatto• 2021

Related benchmarks

TaskDatasetResultRank
Video PredictionCityscapes 9 (test)
MS-SSIM (t+1)95.99
11
Video PredictionCityscapes
MS-SSIM (t+1)95.99
11
Video PredictionKITTI Flow
PSNR24.47
9
Video PredictionKITTI 12 (test)
MS-SSIM (t+1)83.06
9
Video PredictionKITTI
MS-SSIM (t+1)83.06
9
Semantic Map PredictionCityscapes (test)
mIoU (t+5)70.3
7
Video PredictionCityscapes (test)
MS-SSIM (t+1)95.99
7
Video PredictionKITTI Raw
MS-SSIM (t+1)83.06
5
Showing 8 of 8 rows

Other info

Follow for update