Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

RoomPilot: Controllable Synthesis of Interactive Indoor Environments via Multimodal Semantic Parsing

About

Generating controllable and interactive indoor scenes is fundamental to applications in game development, architectural visualization, and embodied AI training. Yet existing approaches either handle a narrow range of input modalities or rely on stochastic processes that hinder controllability. To overcome these limitations, we introduce RoomPilot, a unified framework that parses diverse multi-modal inputs--textual descriptions or CAD floor plans--into an Indoor Domain-Specific Language (IDSL) for indoor structured scene generation. The key insight is that a well-designed IDSL can act as a shared semantic representation, enabling coherent, high-quality scene synthesis from any single modality while maintaining interaction semantics. In contrast to conventional procedural methods that produce visually plausible but functionally inert layouts, RoomPilot leverages a curated dataset of interaction-annotated assets to synthesize environments exhibiting realistic object behaviors. Extensive experiments further validate its strong multi-modal understanding, fine-grained controllability in scene generation, and superior physical consistency and visual fidelity, marking a significant step toward general-purpose controllable 3D indoor scene generation.

Wentang Chen, Shougao Zhang, Yiman Zhang, Tianhao Zhou, Ruihui Li• 2025

Related benchmarks

TaskDatasetResultRank
3D Indoor Scene SynthesisBedroom (Standard Split)
CNR0.00e+0
13
Indoor Scene SynthesisUser Study
Visual Quality4.1
8
3D Scene SynthesisDetailed Language Instructions Living Room
Object Count26.3
6
3D Scene SynthesisDetailed Language Instructions Dining Room
# Objects21.2
6
Controllable Indoor Scene SynthesisIndoor Scene Synthesis Controllability Evaluation
LF58
6
3D Scene SynthesisDetailed Language Instructions Kitchen
Object Count Score10.6
6
3D Scene SynthesisDetailed Language Instructions Bathroom
Object Count10.2
6
3D Scene SynthesisDetailed Language Instructions Average
Object Count (#Obj)16.5
6
Showing 8 of 8 rows

Other info

Follow for update