Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PARSE: Part-Aware Relational Spatial Modeling

About

Inter-object relations underpin spatial intelligence, yet existing representations -- linguistic prepositions or object-level scene graphs -- are too coarse to specify which regions actually support, contain, or contact one another, leading to ambiguous and physically inconsistent layouts. To address these ambiguities, a part-level formulation is needed; therefore, we introduce PARSE, a framework that explicitly models how object parts interact to determine feasible and spatially grounded scene configurations. PARSE centers on the Part-centric Assembly Graph (PAG), which encodes geometric relations between specific object parts, and a Part-Aware Spatial Configuration Solver that converts these relations into geometric constraints to assemble collision-free, physically valid scenes. Using PARSE, we build PARSE-10K, a dataset of 10,000 3D indoor scenes constructed from real-image layout priors and a curated part-annotated shape database, each with dense contact structures and a part-level contact graph. With this structured, spatially grounded supervision, fine-tuning Qwen3-VL on PARSE-10K yields stronger object-level layout reasoning and more accurate part-level relation understanding; furthermore, leveraging PAGs as structural priors in 3D generation models leads to scenes with substantially improved physical realism and structural complexity. Together, these results show that PARSE significantly advances geometry-grounded spatial reasoning and supports the generation of physically consistent 3D scenes.

Yinuo Bai, Peijun Xu, Kuixiang Shao, Yuyang Jiao, Jingxuan Zhang, Kaixin Yao, Jiayuan Gu, Jingyi Yu• 2026

Related benchmarks

TaskDatasetResultRank
Part-level Contact MCQPARSE-10K (test)
Accuracy86.2
6
Scene Graph GenerationPARSE-10K (test)
Recall (BBox Match)73.2
6
Visual Relation MCQPARSE-10K (test)
Accuracy97.4
6
3D Scene GenerationPARSE-10K User Study 1.0 (test)
Complexity47.5
3
Showing 4 of 4 rows

Other info

Follow for update