Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SceneReVis: A Self-Reflective Vision-Grounded Framework for 3D Indoor Scene Synthesis via Multi-turn RL

About

Current one-pass 3D scene synthesis methods often suffer from spatial hallucinations, such as collisions, due to a lack of deliberative reasoning. To bridge this gap, we introduce SceneReVis, a vision-grounded self-reflection framework that employs an iterative ``diagnose-and-act'' loop to explicitly intercept and resolve spatial conflicts using multi-modal feedback. To support this step-wise paradigm, we construct SceneChain-12k, a large-scale dataset of causal construction trajectories derived through a novel reverse engineering pipeline. We further propose a two-stage training recipe that transitions from Supervised Fine-Tuning to Agentic Reinforcement Learning, evolving the model into an active spatial planner. Extensive experiments demonstrate that SceneReVis achieves state-of-the-art performance in high-fidelity generation and goal-oriented optimization, with robust generalization to long-tail domains.

Yang Zhao, Shizhao Sun, Meisheng Zhang, Yingdong Shi, Xubo Yang, Jiang Bian• 2026

Related benchmarks

TaskDatasetResultRank
3D Indoor Scene SynthesisBedroom (Standard Split)
CNR4.6
13
3D Indoor Scene SynthesisLiving Room (Standard Split)
OBR1.2
7
3D Indoor Scene SynthesisAvg. Bed + Living (Standard Split)
OBR2
7
3D Indoor Scene SynthesisUser Study
Physical Plausibility1.8
5
3D Indoor Scene SynthesisDining Room (Generalization Split)
OBR0.1
5
3D Indoor Scene SynthesisStudy Room (Generalization Split)
Object Realism (OBR)0.5
5
3D Indoor Scene SynthesisDining + Study Average (Generalization Split)
OBR (Object Realism)0.3
5
Goal-oriented Scene OptimizationSceneChain-12k Cond 1: Chaotic & Missing
OBR1.1
3
Goal-oriented Scene OptimizationSceneChain Cond 2: Chaotic Only 12k
OBR2.7
3
Goal-oriented Scene OptimizationSceneChain-12k Cond 3: Missing Only
OBR0.011
3
Showing 10 of 11 rows

Other info

Follow for update