TFusionOcc: T-Primitive Based Object-Centric Multi-Sensor Fusion Framework for 3D Occupancy Prediction
About
The prediction of 3D semantic occupancy enables autonomous vehicles (AVs) to perceive the fine-grained geometric and semantic scene structure for safe navigation and decision-making. Existing methods mainly rely on either voxel-based representations, which incur redundant computation over empty regions, or on object-centric Gaussian primitives, which are limited in modeling complex, non-convex, and asymmetric structures. In this paper, we present TFusionOcc, a T-primitive-based object-centric multi-sensor fusion framework for 3D semantic occupancy prediction. Specifically, we introduce a family of Students t-distribution-based T-primitives, including the plain T-primitive, T-Superquadric, and deformable T-Superquadric with inverse warping, where the deformable T-Superquadric serves as the key geometry-enhancing primitive. We further develop a unified probabilistic formulation based on the Students t-distribution and the T-mixture model (TMM) to jointly model occupancy and semantics, and design a tightly coupled multi-stage fusion architecture to effectively integrate camera and LiDAR cues. Extensive experiments on nuScenes show state-of-the-art performance, while additional evaluations on nuScenes-C demonstrate strong robustness under most corruption scenarios. The code will be available at: https://github.com/DanielMing123/TFusionOcc
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Occupancy Prediction | Occ3D-nuScenes (val) | mIoU5.34e+3 | 213 | |
| 3D Semantic Occupancy Prediction | SurroundOcc-nuScenes (val) | mIoU31.47 | 59 | |
| 3D Semantic Occupancy Prediction | SurroundOcc-nuScenes rainy scenario (val) | mIoU30.82 | 26 | |
| 3D Semantic Occupancy Prediction | SurroundOcc-nuScenes night scenario (val) | mIoU (Mean IoU)19.67 | 22 | |
| 3D Semantic Occupancy Prediction | nuScenes-C Camera Corruption v1.0 (val) | Clean Score47.01 | 12 | |
| 3D Semantic Occupancy Prediction | nuScenes (val) | Latency (ms)278.5 | 11 | |
| 3D Semantic Occupancy Prediction | nuScenes-C Lidar Corruption | mIoU (Clean)27.5 | 10 |