Structured Object-Aware Physics Prediction for Video Modeling and Planning

About

When humans observe a physical system, they can easily locate objects, understand their interactions, and anticipate future behavior, even in settings with complicated and previously unseen interactions. For computers, however, learning such models from videos in an unsupervised fashion is an unsolved research problem. In this paper, we present STOVE, a novel state-space model for videos, which explicitly reasons about objects and their positions, velocities, and interactions. It is constructed by combining an image model and a dynamics model in compositional manner and improves on previous work by reusing the dynamics model for inference, accelerating and regularizing training. STOVE predicts videos with convincing physical behavior over hundreds of timesteps, outperforms previous unsupervised models, and even approaches the performance of supervised baselines. We further demonstrate the strength of our model as a simulator for sample efficient model-based control in a task with heavily interacting objects.

Jannik Kossen, Karl Stelzner, Marcel Hussing, Claas Voelcker, Kristian Kersting• 2019

Related benchmarks

Task	Dataset	Result
Push and Switch	OpenAI Fetch - Push and Switch 3-Push + 3-Switch (S+O) (test)	Success Rate71.5	18
Push	OpenAI Fetch Push 3-Push (L+O) (test)	Success Rate91.5	9
Push and Switch	OpenAI Fetch - Push and Switch 2-Push + 2-Switch (L+S) (test)	Success Rate59.5	9
3-Push	Push & Switch	Success Rate95.4	9
2-Push	Push & Switch	Success Rate97.3	9
Push and Switch	OpenAI Fetch - Push and Switch 2-Push + 2-Switch (S) (test)	Success Rate80.8	9
Switch	OpenAI Fetch 3-Switch (L+O) (test)	Success Rate77.2	9
Object Comparison	Spriteworld	Success Rate72.4	9
Push	OpenAI Fetch Push 2-Push (L) (test)	Success Rate93.1	9
2-Switch	Push & Switch	Success Rate91.6	9

Showing 10 of 27 rows

Other info

Follow for update

@wizwand_team Discord