Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning

About

Many existing autonomous driving paradigms involve a multi-stage discrete pipeline of tasks. To better predict the control signals and enhance user safety, an end-to-end approach that benefits from joint spatial-temporal feature learning is desirable. While there are some pioneering works on LiDAR-based input or implicit design, in this paper we formulate the problem in an interpretable vision-based setting. In particular, we propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously, which is called ST-P3. Specifically, an egocentric-aligned accumulation technique is proposed to preserve geometry information in 3D space before the bird's eye view transformation for perception; a dual pathway modeling is devised to take past motion variations into account for future prediction; a temporal-based refinement unit is introduced to compensate for recognizing vision-based elements for planning. To the best of our knowledge, we are the first to systematically investigate each part of an interpretable end-to-end vision-based autonomous driving system. We benchmark our approach against previous state-of-the-arts on both open-loop nuScenes dataset as well as closed-loop CARLA simulation. The results show the effectiveness of our method. Source code, model and protocol details are made publicly available at https://github.com/OpenPerceptionX/ST-P3.

Shengchao Hu, Li Chen, Penghao Wu, Hongyang Li, Junchi Yan, Dacheng Tao• 2022

Related benchmarks

TaskDatasetResultRank
Semantic segmentationnuScenes (val)--
212
Open-loop planningnuScenes (val)
L2 Error (3s)0.75
151
Open-loop planningnuScenes v1.0 (val)
L2 (1s)1.33
59
PlanningnuScenes (val)
Collision Rate (Avg)71
52
Autonomous DrivingCARLA Town05 (Long)
DS11.5
46
PlanningnuScenes v1.0-trainval (val)
ST-P3 L2 Error (1s)1.33
39
Open-loop planningNuScenes v1.0 (test)
L2 Error (1s)1.59
28
Instance-aware occupancy flow predictionnuScenes (val)
IoU38.9
26
BeV SegmentationnuScenes v1.0 (val)
Drivable Area75.97
25
Trajectory PredictionnuScenes v1.0-trainval (val)
L2 Error (1s)1.33
23
Showing 10 of 33 rows

Other info

Code

Follow for update