Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning Long-term Visual Dynamics with Region Proposal Interaction Networks

About

Learning long-term dynamics models is the key to understanding physical common sense. Most existing approaches on learning dynamics from visual input sidestep long-term predictions by resorting to rapid re-planning with short-term models. This not only requires such models to be super accurate but also limits them only to tasks where an agent can continuously obtain feedback and take action at each step until completion. In this paper, we aim to leverage the ideas from success stories in visual recognition tasks to build object representations that can capture inter-object and object-environment interactions over a long-range. To this end, we propose Region Proposal Interaction Networks (RPIN), which reason about each object's trajectory in a latent region-proposal feature space. Thanks to the simple yet effective object representation, our approach outperforms prior methods by a significant margin both in terms of prediction quality and their ability to plan for downstream tasks, and also generalize well to novel environments. Code, pre-trained models, and more visualization results are available at https://haozhi.io/RPIN.

Haozhi Qi, Xiaolong Wang, Deepak Pathak, Yi Ma, Jitendra Malik• 2020

Related benchmarks

TaskDatasetResultRank
Physical ReasoningPHYRE-1B cross-template (test)
AUCCESS42.2
7
Physical ReasoningPHYRE-1B within-template (test)
AUCCESS85.2
7
Physical ReasoningPHYRE Within-template 1.0
Success Rate (AUCCESS)85.49
6
Physical ReasoningPHYRE Cross-template 1.0
Success Rate50.86
6
Physical ReasoningPHYRE within-template B
AUCCESS85.49
5
Trajectory PredictionShapeStacks (SS) (train)
Average Prediction Error1.03
5
Trajectory PredictionRealB (train)
Average Prediction Error0.3
5
Trajectory PredictionSimB (train)
Avg Prediction Error2.55
5
Trajectory PredictionShapeStacks (SS) (t ∈ [T_train, 2 × T_train])
Avg Prediction Error4.73
5
Trajectory PredictionRealB (t ∈ [T_train, 2 × T_train])
Avg Prediction Error2.34
5
Showing 10 of 19 rows

Other info

Follow for update