Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

3D-Belief: Embodied Belief Inference via Generative 3D World Modeling

About

Recent advances in visual generative models have highlighted the promise of learning generative world models. However, most existing approaches frame world modeling as novel-view synthesis or future-frame prediction, emphasizing visual realism rather than the structured uncertainty required by embodied agents acting under partial observability. In this work, we propose a different perspective: world modeling as embodied belief inference in 3D space. From this view, a world model should not merely render what may be seen, but maintain and update an agent's belief about the unobserved 3D world as new observations are acquired. We identify several key capabilities for such models, including spatially consistent scene memory, multi-hypothesis belief sampling, sequential belief updating, and semantically informed prediction of unseen regions. We instantiate these ideas in 3D-Belief, a generative 3D world model that infers explicit, actionable 3D beliefs from partial observations and updates them online over time. Unlike prior visual prediction models, 3D-Belief represents uncertainty directly in 3D, enabling embodied agents to imagine plausible scene completions and reason over partially observed environments. We evaluate 3D-Belief on 2D visual quality for scene memory and unobserved-scene imagination, object- and scene-level 3D imagination using our proposed 3D-CORE benchmark, and challenging object navigation tasks in both simulation and the real world. Experiments show that 3D-Belief improves 2D and 3D imagination quality and downstream embodied task performance compared to state-of-the-art methods.

Yifan Yin, Zehao Wen, Suyu Ye, Jieneng Chen, Zehan Zheng, Nanru Dai, Haojun Shi, Aydan Huang, Zheyuan Zhang, Alan Yuille, Jianwen Xie, Ayush Tewari, Tianmin Shu• 2026

Related benchmarks

TaskDatasetResultRank
Video GenerationRealEstate10K (Re10K) (test)
PSNR20.01
16
Object NavigationAI2-THOR (Simulations)
Success Rate (SR)59.17
12
Object NavigationSimulations
SR (%)59.17
8
Object NavigationReal-world
Success Rate (SR)55.56
4
Spatial Reasoning QASAT (real)
Average Accuracy88.7
4
2D Visual Quality of Belief PredictionAI2-THOR Imagined Scene (test)
FVD271.8
3
2D Visual Quality of Belief Prediction (Observed Scene)AI2-THOR (test)
LPIPS0.0502
3
Object Completion3D-CORE 55% Visibility
BEV IoU48.4
2
Object Permanence3D-CORE
LPIPS0.123
2
Room Completion3D-CORE
Obj. F153.6
2
Showing 10 of 10 rows

Other info

Follow for update