DeepInteraction: 3D Object Detection via Modality Interaction
About
Existing top-performance 3D object detectors typically rely on the multi-modal fusion strategy. This design is however fundamentally restricted due to overlooking the modality-specific useful information and finally hampering the model performance. To address this limitation, in this work we introduce a novel modality interaction strategy where individual per-modality representations are learned and maintained throughout for enabling their unique characteristics to be exploited during object detection. To realize this proposed strategy, we design a DeepInteraction architecture characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Experiments on the large-scale nuScenes dataset show that our proposed method surpasses all prior arts often by a large margin. Crucially, our method is ranked at the first position at the highly competitive nuScenes object detection leaderboard.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Object Detection | nuScenes (val) | NDS75 | 941 | |
| 3D Object Detection | nuScenes (test) | mAP75.6 | 829 | |
| 3D Object Detection | NuScenes v1.0 (test) | mAP70.8 | 210 | |
| 3D Object Detection | nuScenes v1.0 (val) | mAP (Overall)69.9 | 190 | |
| 3D Object Detection | nuScenes-C Sunlight v1.0 (trainval) | mAP64.9 | 13 | |
| 3D Object Detection | nuScenes-C Fog v1.0 (trainval) | mAP54.8 | 13 | |
| 3D Object Detection | nuScenes-C Snow v1.0 (trainval) | mAP62.4 | 13 | |
| 3D Object Detection | nuScenes Night (val) | mAP42.3 | 13 | |
| 3D Object Detection | nuScenes Rainy (val) | mAP69.4 | 13 | |
| 3D Object Detection | nuScenes Clean v1.0-trainval (val) | mAP69.9 | 12 |