Leveraging Transformer Decoder for Automotive Radar Object Detection

About

In this paper, we present a Transformer-based architecture for 3D radar object detection that uses a novel Transformer Decoder as the prediction head to directly regress 3D bounding boxes and class scores from radar feature representations. To bridge multi-scale radar features and the decoder, we propose Pyramid Token Fusion (PTF), a lightweight module that converts a feature pyramid into a unified, scale-aware token sequence. By formulating detection as a set prediction problem with learnable object queries and positional encodings, our design models long-range spatial-temporal correlations and cross-feature interactions. This approach eliminates dense proposal generation and heuristic post-processing such as extensive non-maximum suppression (NMS) tuning. We evaluate the proposed framework on the RADDet, where it achieves significant improvements over state-of-the-art radar-only baselines.

Changxu Zhang, Zhaoze Wang, Tai Fei, Christopher Grimm, Yi Jin, Claas Tebruegge, Ernst Warsitz, Markus Gardill• 2026

Related benchmarks

Task	Dataset	Result
2D Object Detection	RADDet Range-Doppler map	AP@0.555.91	7
3D Object Detection	RADDet (test)	AP@0.453.75	7
2D Object Detection	RADDet Range-Azimuth map	AP@0.50.5538	7

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord