Test-Time 3D Occupancy Prediction

About

Self-supervised 3D occupancy prediction offers a promising solution for understanding complex driving scenes without requiring costly 3D annotations. However, training dense occupancy decoders to capture fine-grained geometry and semantics can demand hundreds of GPU hours, and once trained, such models struggle to adapt to varying voxel resolutions or novel object categories without extensive retraining. To overcome these limitations, we propose a practical and flexible test-time occupancy prediction framework termed TT-Occ. Our method incrementally constructs, optimizes, and voxelizes time-aware 3D Gaussians from raw sensor streams by integrating vision foundation models (VFMs) at runtime. The flexible representation of 3D Gaussians enables voxelization at arbitrary user-specified resolutions, while the strong generalization capability of VFMs supports accurate perception and open-vocabulary recognition without requiring any network training or fine-tuning. To validate the generality and effectiveness of our framework, we present two variants: a LiDAR-based version and a vision-centric version, and conduct extensive experiments on the Occ3D-nuScenes and nuCraft benchmarks under varying voxel resolutions. Experimental results show that TT-Occ significantly outperforms existing computationally expensive pretrained self-supervised counterparts. Code is available at https://github.com/Xian-Bei/TT-Occ.

Fengyi Zhang, Xiangyu Sun, Huitong Yang, Zheng Zhang, Zi Huang, Yadan Luo• 2025

Related benchmarks

Task	Dataset	Result
3D Semantic Occupancy Prediction	Occ3D	RayIoU13.4	52
3D Semantic Occupancy Prediction	Occ3D-nuScenes v1.0 (val)	mIoU27.41	35
Semantic Occupancy Estimation	Occ3D-nuScenes	mIoU16.7	9
3D Semantic Occupancy Prediction	nuCraft high-resolution	Overall mIoU10.92	4
Occupancy Prediction	nuScenes Rainy and Nighttime scenes v1.0 (test)	Score 0911 (Rainy)27	3

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord