Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VEOcc: Voxel-Centric Online Semantic Occupancy Prediction For Embodied Scene Understanding

About

Crucial for autonomous exploration, online 3D occupancy prediction and mapping incrementally constructs dense spatial representations on the fly. However, recent Gaussian-centric methods struggle with structural boundary fidelity and rely heavily on predefined scene-size priors, fundamentally limiting their operational efficiency. In this work, we present VEOcc, a voxel-centric framework formulated as a recursive perception-and-assimilation paradigm. By eliminating the need for initial scale estimation, VEOcc enables highly streamlined, open-ended map expansion. Furthermore, to robustly aggregate noisy temporal observations within the discrete voxel space, we propose a Spatio-Temporal-Aware Online Update Strategy. It integrates Cross-Temporal Logit Aggregation (TLA) for temporal consistency, Reliability-Aware Confidence Modulation (RCM) for spatial uncertainty calibration, and Confidence-Driven Incremental State Update (CSU) for robust global state assimilation. % Extensive experiments on Occ-ScanNet and EmbodiedOcc-ScanNet demonstrate that VEOcc establishes new state-of-the-art performance in both local and embodied settings, providing an accurate and efficient solution for real-world exploration. Extensive experiments on Occ-ScanNet and EmbodiedOcc-ScanNet demonstrate that VEOcc establishes new state-of-the-art performance in both local and embodied settings. Notably, zero-shot evaluations on self-collected video sequences further confirm its robust out-of-distribution generalization capability in completely unseen real-world environments. Ultimately, our framework provides an accurate and highly efficient solution for autonomous exploration. Code and supplementary visualizations are available on our project page: https://wryzju.github.io/VEOcc/.

Ruoyu Wang, Yong Liu, Sheng Tao, Yuhang Lin, Yukai Ma• 2026

Related benchmarks

TaskDatasetResultRank
Embodied 3D Occupancy PredictionEmbodiedOcc-ScanNet
SC-IoU62.21
11
Local Occupancy PredictionOcc-ScanNet Mini
Overall IoU67.89
7
Local Occupancy PredictionOcc-ScanNet
IoU64.55
7
Embodied Occupancy PredictionEmbodiedOcc-ScanNet Mini
IoU64.19
3
Embodied Occupancy PredictionEmbodiedOcc-ScanNet
Parameters (M)177.5
2
Local Occupancy PredictionOccScanNet
Parameters (M)177.1
2
Showing 6 of 6 rows

Other info

Follow for update