Scene Synthesis from Human Motion

About

Large-scale capture of human motion with diverse, complex scenes, while immensely useful, is often considered prohibitively costly. Meanwhile, human motion alone contains rich information about the scene they reside in and interact with. For example, a sitting human suggests the existence of a chair, and their leg position further implies the chair's pose. In this paper, we propose to synthesize diverse, semantically reasonable, and physically plausible scenes based on human motion. Our framework, Scene Synthesis from HUMan MotiON (SUMMON), includes two steps. It first uses ContactFormer, our newly introduced contact predictor, to obtain temporally consistent contact labels from human motion. Based on these predictions, SUMMON then chooses interacting objects and optimizes physical plausibility losses; it further populates the scene with objects that do not interact with humans. Experimental results demonstrate that SUMMON synthesizes feasible, plausible, and diverse scenes and has the potential to generate extensive human-scene interaction data for the community.

Sifan Ye, Yixing Wang, Jiaman Li, Dennis Park, C. Karen Liu, Huazhe Xu, Jiajun Wu• 2023

Related benchmarks

Task	Dataset	Result
Scene Synthesis	HUMANISE (test)	CD5.326	7
Scene Synthesis	PRO-teXt (test)	CD2.1437	7
Scene Synthesis	PRO-teXt	3D IP0.0559	5
Scene Synthesis	HUMANISE	3D IP7.19	5
Contact Object Recovery	smoothed PROXD (val)	Non-collision Score0.851	4
Contact Object Recovery	GIMO (unseen)	Non-collision Score95.1	4
Contact Semantic Prediction	PROXD (val)	Reconstruction Accuracy91.2	4

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord