DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction
About
Multi-sensor fusion significantly enhances the accuracy and robustness of 3D semantic occupancy prediction, which is crucial for autonomous driving and robotics. However, most existing approaches depend on high-resolution images and complex networks to achieve top performance, hindering their deployment in practical scenarios. Moreover, current multi-sensor fusion approaches mainly focus on improving feature fusion while largely neglecting effective supervision strategies for those features. To address these issues, we propose DAOcc, a novel multi-modal occupancy prediction framework that leverages 3D object detection supervision to assist in achieving superior performance, while using a deployment-friendly image backbone and practical input resolution. In addition, we introduce a BEV View Range Extension strategy to mitigate performance degradation caused by lower image resolution. Extensive experiments demonstrate that DAOcc achieves new state-of-the-art results on both the Occ3D-nuScenes and Occ3D-Waymo benchmarks, and outperforms previous state-of-the-art methods by a significant margin using only a ResNet-50 backbone and 256*704 input resolution. With TensorRT optimization, DAOcc reaches 104.9 FPS while maintaining 54.2 mIoU on an NVIDIA RTX 4090 GPU. Code is available at https://github.com/AlphaPlusTT/DAOcc.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Occupancy Prediction | Occ3D-nuScenes (val) | mIoU54.33 | 144 | |
| 3D Semantic Occupancy Prediction | SurroundOcc-nuScenes (val) | IoU45 | 31 | |
| 3D Semantic Occupancy Prediction | SurroundOcc-nuScenes rainy scenario (val) | mIoU29.65 | 26 | |
| 3D Semantic Occupancy Prediction | SurroundOcc-nuScenes night scenario (val) | mIoU (Mean IoU)18.53 | 22 | |
| 3D Semantic Occupancy Prediction | nuScenes-C Camera Corruption v1.0 (val) | Clean Score45 | 12 | |
| 3D Occupancy Prediction | Occ3D Waymo (val) | mIoU45.13 | 10 | |
| 3D Semantic Occupancy Prediction | nuScenes-C Lidar Corruption | mIoU (Clean)27.73 | 10 |