Gau-Occ: Geometry-Completed Gaussians for Multi-Modal 3D Occupancy Prediction

About

3D semantic occupancy prediction is crucial for autonomous driving. While multi-modal fusion improves accuracy over vision-only methods, it typically relies on computationally expensive dense voxel or BEV tensors. We present Gau-Occ, a multi-modal framework that bypasses dense volumetric processing by modeling the scene as a compact collection of semantic 3D Gaussians. To ensure geometric completeness, we propose a LiDAR Completion Diffuser (LCD) that recovers missing structures from sparse LiDAR to initialize robust Gaussian anchors. Furthermore, we introduce Gaussian Anchor Fusion (GAF), which efficiently integrates multi-view image semantics via geometry-aligned 2D sampling and cross-modal alignment. By refining these compact Gaussian descriptors, Gau-Occ captures both spatial consistency and semantic discriminability. Extensive experiments across challenging benchmarks demonstrate that Gau-Occ achieves state-of-the-art performance with significant computational efficiency.

Chengxin Lv, Yihui Li, Hongyu Yang, YunHong Wang• 2026

Related benchmarks

Task	Dataset	Result
3D Occupancy Prediction	Occ3D-nuScenes (val)	mIoU55.1	215
3D Semantic Occupancy Prediction	SurroundOcc-nuScenes (val)	mIoU32.7	59
3D Semantic Occupancy Prediction	nuScenes (val)	IoU44.3	15
3D Semantic Occupancy Prediction	KITTI-360 (val)	IoU58.9	11

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord