TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving

About

How should we integrate representations from complementary sensors for autonomous driving? Geometry-based fusion has shown promise for perception (e.g. object detection, motion forecasting). However, in the context of end-to-end driving, we find that imitation learning based on existing sensor fusion methods underperforms in complex driving scenarios with a high density of dynamic agents. Therefore, we propose TransFuser, a mechanism to integrate image and LiDAR representations using self-attention. Our approach uses transformer modules at multiple resolutions to fuse perspective view and bird's eye view feature maps. We experimentally validate its efficacy on a challenging new benchmark with long routes and dense traffic, as well as the official leaderboard of the CARLA urban driving simulator. At the time of submission, TransFuser outperforms all prior work on the CARLA leaderboard in terms of driving score by a large margin. Compared to geometry-based fusion, TransFuser reduces the average collisions per kilometer by 48%.

Kashyap Chitta, Aditya Prakash, Bernhard Jaeger, Zehao Yu, Katrin Renz, Andreas Geiger• 2022

Related benchmarks

Task	Dataset	Result
Autonomous Driving	NAVSIM v1 (test)	NC98.3	147
Autonomous Driving Planning	NAVSIM v1	NC97.8	126
Autonomous Driving Planning	NAVSIM v1 (test)	NC98	118
Autonomous Driving Planning	NAVSIM navhard v2	NC96.2	88
Autonomous Driving Planning	NAVSIM (navtest)	NC97.8	68
Autonomous Driving	NAVSIM (test)	PDMS84	62
End-to-end Planning	NAVSIM v1	NC97.7	61
Planning	NAVSIM (test)	PDMS84	59
Planning	NAVSIM (navtest)	NC99.2	53
Autonomous Driving	CARLA Town05 (Long)	DS61.18	53

Showing 10 of 107 rows

...

Other info

Code

Follow for update

@wizwand_team Discord