SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds
About
3D object detection in point clouds is a core component for modern robotics and autonomous driving systems. A key challenge in 3D object detection comes from the inherent sparse nature of point occupancy within the 3D scene. In this paper, we propose Sparse Window Transformer (SWFormer ), a scalable and accurate model for 3D object detection, which can take full advantage of the sparsity of point clouds. Built upon the idea of window-based Transformers, SWFormer converts 3D points into sparse voxels and windows, and then processes these variable-length sparse windows efficiently using a bucketing scheme. In addition to self-attention within each spatial window, our SWFormer also captures cross-window correlation with multi-scale feature fusion and window shifting operations. To further address the unique challenge of detecting 3D objects accurately from sparse features, we propose a new voxel diffusion technique. Experimental results on the Waymo Open Dataset show our SWFormer achieves state-of-the-art 73.36 L2 mAPH on vehicle and pedestrian for 3D object detection on the official test set, outperforming all previous single-stage and two-stage models, while being much more efficient.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Object Detection | Waymo Open Dataset (val) | 3D APH Vehicle L270.6 | 175 | |
| 3D Object Detection | Waymo Open Dataset (test) | Vehicle L2 mAPH74.7 | 105 | |
| 3D Object Detection | Waymo Open Dataset (WOD) (val) | Vehicle L1 mAP79.4 | 47 | |
| 3D Object Detection | Waymo Open Dataset LEVEL_1 (val) | -- | 46 | |
| 3D Object Detection | Waymo Open Dataset LEVEL_2 (val) | -- | 46 | |
| 3D Object Detection | Waymo (val) | Vehicle L2 AP69.2 | 38 | |
| 3D Object Detection | Waymo Open 100% (val) | Vehicle AP (L1)77.8 | 36 | |
| 3D Object Detection | Waymo Open Dataset 1.2 (val) | Vehicle mAP H L268.8 | 32 | |
| 3D Object Detection | Waymo Open Dataset (WOD) (val) | Vehicle L1 3D AP81 | 27 | |
| BEV Object Detection | Waymo Open Dataset (WOD) (val) | Vehicle L1 AP92.6 | 5 |