3D-Belief: Embodied Belief Inference via Generative 3D World Modeling

About

Recent advances in visual generative models have highlighted the promise of learning generative world models. However, most existing approaches frame world modeling as novel-view synthesis or future-frame prediction, emphasizing visual realism rather than the structured uncertainty required by embodied agents acting under partial observability. In this work, we propose a different perspective: world modeling as embodied belief inference in 3D space. From this view, a world model should not merely render what may be seen, but maintain and update an agent's belief about the unobserved 3D world as new observations are acquired. We identify several key capabilities for such models, including spatially consistent scene memory, multi-hypothesis belief sampling, sequential belief updating, and semantically informed prediction of unseen regions. We instantiate these ideas in 3D-Belief, a generative 3D world model that infers explicit, actionable 3D beliefs from partial observations and updates them online over time. Unlike prior visual prediction models, 3D-Belief represents uncertainty directly in 3D, enabling embodied agents to imagine plausible scene completions and reason over partially observed environments. We evaluate 3D-Belief on 2D visual quality for scene memory and unobserved-scene imagination, object- and scene-level 3D imagination using our proposed 3D-CORE benchmark, and challenging object navigation tasks in both simulation and the real world. Experiments show that 3D-Belief improves 2D and 3D imagination quality and downstream embodied task performance compared to state-of-the-art methods.

Yifan Yin, Zehao Wen, Suyu Ye, Jieneng Chen, Zehan Zheng, Nanru Dai, Haojun Shi, Aydan Huang, Zheyuan Zhang, Alan Yuille, Jianwen Xie, Ayush Tewari, Tianmin Shu• 2026

Related benchmarks

Task	Dataset	Result
Video Generation	RealEstate10K (Re10K) (test)	PSNR20.01	16
Object Navigation	AI2-THOR (Simulations)	Success Rate (SR)59.17	12
Object Navigation	Simulations	SR (%)59.17	8
Object Navigation	Real-world	Success Rate (SR)55.56	4
Spatial Reasoning QA	SAT (real)	Average Accuracy88.7	4
2D Visual Quality of Belief Prediction	AI2-THOR Imagined Scene (test)	FVD271.8	3
2D Visual Quality of Belief Prediction (Observed Scene)	AI2-THOR (test)	LPIPS0.0502	3
Object Completion	3D-CORE 55% Visibility	BEV IoU48.4	2
Object Permanence	3D-CORE	LPIPS0.123	2
Room Completion	3D-CORE	Obj. F153.6	2

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord