A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning

About

This paper takes a step towards temporal reasoning in a dynamically changing video, not in the pixel space that constitutes its frames, but in a latent space that describes the non-linear dynamics of the objects in its world. We introduce the Kalman variational auto-encoder, a framework for unsupervised learning of sequential data that disentangles two latent representations: an object's representation, coming from a recognition model, and a latent state describing its dynamics. As a result, the evolution of the world can be imagined and missing data imputed, both without the need to generate high dimensional frames at each time step. The model is trained end-to-end on videos of a variety of simulated physical systems, and outperforms competing methods in generative and missing data imputation tasks.

Marco Fraccaro, Simon Kamronn, Ulrich Paquet, Ole Winther• 2017

Related benchmarks

Task	Dataset	Result
Time Series Forecasting	Traffic (test)	--	334
Time Series Forecasting	Electricity (test)	--	237
Time Series Forecasting	Solar (test)	CRPS0.389	19
Time Series Forecasting	Wiki (test)	CRPS0.317	19
Time Series Forecasting	Exchange (test)	CRPS0.018	19
Video Forecasting	2D Bouncing Ball	LPIPS@200.053	7
Regime Estimation	2D Bouncing Ball	Input F110.5	6

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord