Structural-RNN: Deep Learning on Spatio-Temporal Graphs
About
Deep Recurrent Neural Network architectures, though remarkably capable at modeling sequences, lack an intuitive high-level spatio-temporal structure. That is while many problems in computer vision inherently have an underlying high-level structure and can benefit from it. Spatio-temporal graphs are a popular tool for imposing such high-level intuitions in the formulation of real world problems. In this paper, we propose an approach for combining the power of high-level spatio-temporal graphs and sequence learning success of Recurrent Neural Networks~(RNNs). We develop a scalable method for casting an arbitrary spatio-temporal graph as a rich RNN mixture that is feedforward, fully differentiable, and jointly trainable. The proposed method is generic and principled as it can be used for transforming any spatio-temporal graph through employing a certain set of well defined steps. The evaluations of the proposed approach on a diverse set of problems, ranging from modeling human motion to object interactions, shows improvement over the state-of-the-art with a large margin. We expect this method to empower new approaches to problem formulation through high-level spatio-temporal graphs and Recurrent Neural Networks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Human Motion Prediction | Human3.6M | MAE (1000ms)2.13 | 46 | |
| Human Pose Prediction | Human 3.6M Subject 5 (test) | -- | 24 | |
| Long-term Motion Prediction | H3.6M Smoking | MAE (1000ms)3.23 | 12 | |
| Long-term Motion Prediction | H3.6M Discussion | MAE (1000ms)2.43 | 12 | |
| Anticipation | CAD-120 | Sub-activity F1 Score65.6 | 8 | |
| Sub-activity detection | CAD-120 leave-one-subject-out (cross-val) | F1 Score83.2 | 7 | |
| 3D Human Pose Prediction | Human 3.6M (Subject 5) | Walking MAE (80ms)0.81 | 7 | |
| Object affordance detection | CAD-120 leave-one-subject-out (cross-val) | F1 Score91.1 | 6 | |
| Vehicle Behavior Prediction | Apollo Scape (test) | MAU Recall (Moving Away)76 | 5 | |
| Detection | CAD-120 | Sub-activity F183.2 | 4 |