Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds

About

3D object detection in point clouds is a core component for modern robotics and autonomous driving systems. A key challenge in 3D object detection comes from the inherent sparse nature of point occupancy within the 3D scene. In this paper, we propose Sparse Window Transformer (SWFormer ), a scalable and accurate model for 3D object detection, which can take full advantage of the sparsity of point clouds. Built upon the idea of window-based Transformers, SWFormer converts 3D points into sparse voxels and windows, and then processes these variable-length sparse windows efficiently using a bucketing scheme. In addition to self-attention within each spatial window, our SWFormer also captures cross-window correlation with multi-scale feature fusion and window shifting operations. To further address the unique challenge of detecting 3D objects accurately from sparse features, we propose a new voxel diffusion technique. Experimental results on the Waymo Open Dataset show our SWFormer achieves state-of-the-art 73.36 L2 mAPH on vehicle and pedestrian for 3D object detection on the official test set, outperforming all previous single-stage and two-stage models, while being much more efficient.

Pei Sun, Mingxing Tan, Weiyue Wang, Chenxi Liu, Fei Xia, Zhaoqi Leng, Dragomir Anguelov• 2022

Related benchmarks

TaskDatasetResultRank
3D Object DetectionWaymo Open Dataset (val)
3D APH Vehicle L270.6
175
3D Object DetectionWaymo Open Dataset (test)
Vehicle L2 mAPH74.7
105
3D Object DetectionWaymo Open Dataset (WOD) (val)
Vehicle L1 mAP79.4
47
3D Object DetectionWaymo Open Dataset LEVEL_1 (val)--
46
3D Object DetectionWaymo Open Dataset LEVEL_2 (val)--
46
3D Object DetectionWaymo (val)
Vehicle L2 AP69.2
38
3D Object DetectionWaymo Open 100% (val)
Vehicle AP (L1)77.8
36
3D Object DetectionWaymo Open Dataset 1.2 (val)
Vehicle mAP H L268.8
32
3D Object DetectionWaymo Open Dataset (WOD) (val)
Vehicle L1 3D AP81
27
BEV Object DetectionWaymo Open Dataset (WOD) (val)
Vehicle L1 AP92.6
5
Showing 10 of 10 rows

Other info

Follow for update