Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Scene Synthesis from Human Motion

About

Large-scale capture of human motion with diverse, complex scenes, while immensely useful, is often considered prohibitively costly. Meanwhile, human motion alone contains rich information about the scene they reside in and interact with. For example, a sitting human suggests the existence of a chair, and their leg position further implies the chair's pose. In this paper, we propose to synthesize diverse, semantically reasonable, and physically plausible scenes based on human motion. Our framework, Scene Synthesis from HUMan MotiON (SUMMON), includes two steps. It first uses ContactFormer, our newly introduced contact predictor, to obtain temporally consistent contact labels from human motion. Based on these predictions, SUMMON then chooses interacting objects and optimizes physical plausibility losses; it further populates the scene with objects that do not interact with humans. Experimental results demonstrate that SUMMON synthesizes feasible, plausible, and diverse scenes and has the potential to generate extensive human-scene interaction data for the community.

Sifan Ye, Yixing Wang, Jiaman Li, Dennis Park, C. Karen Liu, Huazhe Xu, Jiajun Wu• 2023

Related benchmarks

TaskDatasetResultRank
Scene SynthesisHUMANISE (test)
CD5.326
7
Scene SynthesisPRO-teXt (test)
CD2.1437
7
Scene SynthesisPRO-teXt
3D IP0.0559
5
Scene SynthesisHUMANISE
3D IP7.19
5
Contact Object Recoverysmoothed PROXD (val)
Non-collision Score0.851
4
Contact Object RecoveryGIMO (unseen)
Non-collision Score95.1
4
Contact Semantic PredictionPROXD (val)
Reconstruction Accuracy91.2
4
Showing 7 of 7 rows

Other info

Follow for update