Transframer: Arbitrary Frame Prediction with Generative Models

About

We present a general-purpose framework for image modelling and vision tasks based on probabilistic frame prediction. Our approach unifies a broad range of tasks, from image segmentation, to novel view synthesis and video interpolation. We pair this framework with an architecture we term Transframer, which uses U-Net and Transformer components to condition on annotated context frames, and outputs sequences of sparse, compressed image features. Transframer is the state-of-the-art on a variety of video generation benchmarks, is competitive with the strongest models on few-shot view synthesis, and can generate coherent 30 second videos from a single image without any explicit geometric information. A single generalist Transframer simultaneously produces promising results on 8 tasks, including semantic segmentation, image classification and optical flow prediction with no task-specific architectural components, demonstrating that multi-task computer vision can be tackled using probabilistic image models. Our approach can in principle be applied to a wide range of applications that require learning the conditional structure of annotated image-formatted data.

Charlie Nash, Jo\~ao Carreira, Jacob Walker, Iain Barr, Andrew Jaegle, Mateusz Malinowski, Peter Battaglia• 2022

Related benchmarks

Task	Dataset	Result
Video Prediction	Kinetics-600 (test)	FVD25.4	46
Video Frame Prediction	Kinetics-600	gFVD25.4	38
Video Prediction	BAIR Robot Pushing	FVD100	38
Video Prediction	Bair	FVD100	34
Video Generation	Kinetics-600	FVD25.4	22
Video Generation	Bair	FVD Score100	22
Video Prediction	Kinetics-600	FVD25.4	18
Frame prediction	Bair	FVD100	15

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord