ReSpace: Text-Driven Autoregressive 3D Indoor Scene Synthesis and Editing
About
Scene synthesis and editing has emerged as a promising direction in computer graphics. Current trained approaches for 3D indoor scene generation either oversimplify object semantics through one-hot class encodings (e.g., 'chair' or 'table'), require masked diffusion for editing, ignore room boundaries, or rely on floor plan renderings that fail to capture complex layouts. LLM-based methods enable richer semantics via natural language, but lack editing functionality, are limited to rectangular layouts, or rely on weak spatial reasoning from implicit world models. We introduce ReSpace, a generative framework for autoregressive text-driven 3D indoor scene synthesis and editing. Our approach features a compact structured scene representation with explicit room boundaries that enables asset-agnostic deployment and frames scene manipulation as a next-token prediction task, supporting object addition, removal, and swapping via natural language. We employ supervised fine-tuning with a preference alignment stage to train a specialized language model for object addition that accounts for user instructions, spatial geometry, object semantics, and scene-level composition. We further introduce a voxelization-based evaluation metric capturing fine-grained geometric violations beyond 3D bounding boxes. Experiments surpass state-of-the-art on object addition and achieve superior human-perceived quality on the application of full scene synthesis, despite not being trained on it.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Indoor Scene Synthesis | Bedroom (Standard Split) | CNR41.9 | 13 | |
| 3D Indoor Scene Synthesis | Avg. Bed + Living (Standard Split) | OBR13.1 | 7 | |
| 3D Indoor Scene Synthesis | Living Room (Standard Split) | OBR11.5 | 7 | |
| 3D Indoor Scene Synthesis | 3D-FRONT rectangular-only subset of 3 × 257 scenes (all) | OOB4.6 | 6 | |
| Full Scene Synthesis | rectangular-only (all) | BT Score0.4251 | 5 | |
| Scene Synthesis | 3D-FRONT bedrooms unseen floor plans (test) | OOB2.9 | 4 | |
| Scene Synthesis | 3D-FRONT living rooms unseen floor plans (test) | OOB4.5 | 4 | |
| Scene Synthesis | 3D-FRONT unseen floor plans (test) | OOB Error4.2 | 4 | |
| Object Addition | 3D-FRONT bed (hold-out test) | OOB11.77 | 4 | |
| Object Addition | 3D-FRONT liv (test) | OOB Score10.68 | 4 |