Multi-Head Attention for Multi-Modal Joint Vehicle Motion Forecasting
About
This paper presents a novel vehicle motion forecasting method based on multi-head attention. It produces joint forecasts for all vehicles on a road scene as sequences of multi-modal probability density functions of their positions. Its architecture uses multi-head attention to account for complete interactions between all vehicles, and long short-term memory layers for encoding and forecasting. It relies solely on vehicle position tracks, does not need maneuver definitions, and does not represent the scene with a spatial grid. This allows it to be more versatile than similar model while combining any forecasting capabilities, namely joint forecast with interactions, uncertainty estimation, and multi-modality. The resulting prediction likelihood outperforms state-of-the-art models on the same dataset.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Trajectory Prediction | Argoverse (test) | -- | 36 | |
| Motion forecasting | Argoverse 1 (test) | b-minFDE (K=6)2.12 | 30 | |
| Motion forecasting | Argoverse Motion Forecasting 1.1 (test) | minADE (K=1)1.74 | 27 | |
| Trajectory Prediction | Argoverse 1.0 (test) | minADE (k=6)0.98 | 15 | |
| Trajectory Prediction | Argoverse Motion Forecasting Leaderboard 1.0 (test) | minADE (6)1 | 12 | |
| Motion forecasting | Argoverse (test) | minFDE (K=1)4.24 | 12 | |
| Ego-only motion forecasting | Argoverse (test) | minADE (6h)0.98 | 7 | |
| Motion trajectory prediction | Argoverse Leaderboard ADE@1 top (test) | ADE1.68 | 5 |