Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RoomPilot: Controllable Indoor Scene Synthesis via Multimodal Semantic Parsing

About

Generating controllable indoor scenes is fundamental to applications in game development, architectural visualization, and embodied AI. However, existing approaches either support a limited input modalities or rely on implicit generation processes that hinder precise control over scene structure and semantics. To address these limitations, we introduce RoomPilot, a unified framework for controllable indoor scene synthesis from multi-modal inputs, including textual descriptions and CAD floor plans. RoomPilot maps heterogeneous inputs into an Indoor Domain-Specific Language (IDSL), which serves as a structured and interpretable semantic representation for describing indoor scenes. Built upon IDSL, RoomPilot presents a hierarchical synthesis pipeline that progressively organizes scenes at the building, room, and object levels, promoting structural coherence and functional consistency across multi-room layouts. Moreover, RoomPilot constructs a curated asset dataset with rich semantic annotations to support high-quality scene synthesis, improving visual realism and appearance consistency. Extensive experiments demonstrate effective multi-modal understanding, fine-grained controllability in scene generation, and improved physical consistency and visual fidelity, marking a significant step toward controllable 3D indoor scene synthesis. Code and model will be available.

Wentang Chen, Shougao Zhang, Yiman Zhang, Tianhao Zhou, Ruihui Li• 2025

Related benchmarks

TaskDatasetResultRank
3D Indoor Scene SynthesisBedroom (Standard Split)
CNR0.00e+0
13
3D Scene SynthesisDetailed Language Instructions Average
Object Count (#Obj)16.5
11
Indoor Scene SynthesisUser Study
Visual Quality4.1
8
3D Scene SynthesisDetailed Language Instructions Living Room
Object Count26.3
6
3D Scene SynthesisDetailed Language Instructions Dining Room
# Objects21.2
6
Controllable Indoor Scene SynthesisIndoor Scene Synthesis Controllability Evaluation
LF58
6
3D Scene SynthesisDetailed Language Instructions Kitchen
Object Count Score10.6
6
3D Scene SynthesisDetailed Language Instructions Bathroom
Object Count10.2
6
Showing 8 of 8 rows

Other info

Follow for update