Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SpatialGrammar: A Domain-Specific Language for LLM-Based 3D Indoor Scene Generation

About

Automatically generating interactive 3D indoor scenes from natural language is crucial for virtual reality, gaming, and embodied AI. However, existing LLM-based approaches often suffer from spatial errors and collisions, in part because common scene representations-raw coordinates or verbose code-are difficult for models to reason about 3D spatial relationships and physical constraints. We propose SpatialGrammar, a domain-specific language that represents gravity-aligned indoor layouts as BEV grid placements with deterministic compilation to valid 3D geometry, enabling verifiable constraint checking. Building on this representation, we develop (1) SG-Agent, a closed-loop system that uses compiler feedback to iteratively refine scenes and enforce collision constraints, and (2) SG-Mini, a 104M-parameter model trained entirely on compiler-validated synthetic data. Across 159 test scenes spanning five scenarios of different complexity, SG-Agent improves spatial fidelity and physical plausibility over prior methods, while SG-Mini performs competitively against larger LLM-based baselines on single-shot generation scenarios.

Song Tang, Kaiyong Zhao, Yuliang Li, Qingsong Yan, Penglei Sun, Junyi Zou, Qiang Wang, Xiaowen Chu• 2026

Related benchmarks

TaskDatasetResultRank
3D Scene Layout GenerationCore generation scenarios Single-Object
DRFR99
6
3D Scene Layout GenerationCore generation scenarios Multi-Object
DRFR83
6
3D Scene Layout GenerationCore generation scenarios Hierarchical
DRFR0.93
6
Architectural generationArchitectural generation
DRFR85
3
Multi-turn EditingMulti-turn editing
DRFR92
2
Showing 5 of 5 rows

Other info

Follow for update