Recurrent Neural Networks for Driver Activity Anticipation via Sensory-Fusion Architecture
About
Anticipating the future actions of a human is a widely studied problem in robotics that requires spatio-temporal reasoning. In this work we propose a deep learning approach for anticipation in sensory-rich robotics applications. We introduce a sensory-fusion architecture which jointly learns to anticipate and fuse information from multiple sensory streams. Our architecture consists of Recurrent Neural Networks (RNNs) that use Long Short-Term Memory (LSTM) units to capture long temporal dependencies. We train our architecture in a sequence-to-sequence prediction manner, and it explicitly learns to predict the future given only a partial temporal context. We further introduce a novel loss layer for anticipation which prevents over-fitting and encourages early anticipation. We use our architecture to anticipate driving maneuvers several seconds before they happen on a natural driving data set of 1180 miles. The context for maneuver anticipation comes from multiple sensors installed on the vehicle. Our approach shows significant improvement over the state-of-the-art in maneuver anticipation by increasing the precision from 77.4% to 90.5% and recall from 71.2% to 87.4%.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Future Trajectory Prediction | SDD (Stanford Drone Dataset) (test) | -- | 51 | |
| Early Action Recognition | ActivityNet (test) | Top-1 Action Accuracy70.5 | 48 | |
| Action Anticipation | Epic-Kitchen 55 (val) | -- | 33 | |
| Early Action Recognition | EPIC-KITCHENS (val) | Top-1 Accuracy31.46 | 32 | |
| Action Anticipation | EGTEA Gaze+ (val) | Top-5 Action Accuracy72.38 | 27 | |
| Egocentric Action Anticipation | EPIC-KITCHENS (val) | Top-5 Action Accuracy @ 1.0s28.6 | 17 | |
| Future Trajectory Prediction | KITTI (test) | Error (m)4.29 | 16 | |
| Egocentric Action Anticipation | EPIC-KITCHENS (test) | Top-5 Action Accuracy @ 1s28.56 | 11 | |
| Action Anticipation | ActivityNet | Top-5 Acc (Ta=1.0s)67.05 | 10 | |
| Early Action Recognition | EGTEA Gaze+ | Top-1 Acc (12.5%)40.31 | 10 |