GaussianOcc3D: A Gaussian-Based Adaptive Multi-modal 3D Occupancy Prediction
About
3D semantic occupancy prediction is a pivotal task in autonomous driving, providing a dense and fine-grained understanding of the surrounding environment, yet single-modality methods face trade-offs between camera semantics and LiDAR geometry. Existing multi-modal frameworks often struggle with modality heterogeneity, spatial misalignment, and the representation crisis--where voxels are computationally heavy and BEV alternatives are lossy. We present GaussianOcc3D, a multi-modal framework bridging camera and LiDAR through a memory-efficient, continuous 3D Gaussian representation. We introduce four modules: (1) LiDAR Depth Feature Aggregation (LDFA), using depth-wise deformable sampling to lift sparse signals onto Gaussian primitives; (2) Entropy-Based Feature Smoothing (EBFS) to mitigate domain noise; (3) Adaptive Camera-LiDAR Fusion (ACLF) with uncertainty-aware reweighting for sensor reliability; and (4) a Gauss-Mamba Head leveraging Selective State Space Models for global context with linear complexity. Evaluations on Occ3D, SurroundOcc, and SemanticKITTI benchmarks demonstrate state-of-the-art performance, achieving mIoU scores of 49.4%, 28.9%, and 25.2% respectively. GaussianOcc3D exhibits superior robustness across challenging rainy and nighttime conditions.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic Occupancy Prediction | Occ3D (val) | mIoU49.4 | 37 | |
| 3D Semantic Occupancy Prediction | SurroundOcc (val) | mIoU28.9 | 36 | |
| Semantic Occupancy Prediction | SemanticKITTI (test) | mIoU25.2 | 32 | |
| 3D Semantic Occupancy Prediction | SurroundOcc-nuScenes rainy scenario (val) | mIoU27.1 | 26 | |
| 3D Semantic Occupancy Prediction | SurroundOcc Night (val) | mIoU15.9 | 4 |