BEVTraj: Map-Free End-to-End Trajectory Prediction in Bird's-Eye View with Deformable Attention and Sparse Goal Proposals
About
In autonomous driving, trajectory prediction is essential for safe and efficient navigation. While recent methods often rely on high-definition (HD) maps to provide structured environmental priors, such maps are costly to maintain, geographically limited, and unreliable in dynamic or unmapped scenarios. Directly leveraging raw sensor data in Bird's-Eye View (BEV) space offers greater flexibility, but BEV features are dense and unstructured, making agent-centric spatial reasoning challenging and computationally inefficient. To address this, we propose Bird's-Eye View Trajectory Prediction (BEVTraj), a map-free framework that employs deformable attention to adaptively aggregate task-relevant context from sparse locations in dense BEV features. We further introduce a Sparse Goal Candidate Proposal (SGCP) module that predicts a small set of realistic goals, enabling fully end-to-end multimodal forecasting without heuristic post-processing. Extensive experiments show that BEVTraj achieves performance comparable to state-of-the-art HD map-based methods while providing greater robustness and flexibility without relying on pre-built maps. The source code is available at https://github.com/Kongminsang/bevtraj.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Trajectory Prediction | Argoverse 2 (val) | Miss Rate21.44 | 11 | |
| Multi-agent Trajectory Prediction | nuScenes (val) | AvgMinADE61.9285 | 5 | |
| Trajectory Prediction | nuScenes (val) | minADE (5s)1.3953 | 5 |