Joint-Relation Transformer for Multi-Person Motion Prediction

About

Multi-person motion prediction is a challenging problem due to the dependency of motion on both individual past movements and interactions with other people. Transformer-based methods have shown promising results on this task, but they miss the explicit relation representation between joints, such as skeleton structure and pairwise distance, which is crucial for accurate interaction modeling. In this paper, we propose the Joint-Relation Transformer, which utilizes relation information to enhance interaction modeling and improve future motion prediction. Our relation information contains the relative distance and the intra-/inter-person physical constraints. To fuse relation and joint information, we design a novel joint-relation fusion layer with relation-aware attention to update both features. Additionally, we supervise the relation information by forecasting future distance. Experiments show that our method achieves a 13.4% improvement of 900ms VIM on 3DPW-SoMoF/RC and 17.8%/12.0% improvement of 3s MPJPE on CMU-Mpcap/MuPoTS-3D dataset.

Qingyao Xu, Weibo Mao, Jingze Gong, Chenxin Xu, Siheng Chen, Weidi Xie, Ya Zhang, Yanfeng Wang• 2023

Related benchmarks

Task	Dataset	Result
Multi-agent human pose forecasting	JRDB-GlobMultiPose Short-term (test)	JPE237.9	8
Multi-agent human pose forecasting	JRDB-GlobMultiPose Long-term (test)	JPE351.9	8
Multi-agent human pose forecasting	CMU-Mocap UMPM (test)	JPE168.5	8
Multi-person motion prediction	CMU-Mocap UMPM 3 persons	JPE (0.2s)32	8
Multi-agent human pose forecasting	3DPW (test)	JPE181.9	8
Single person pose forecasting	CMU MOCAP	MPJPE (1000ms)255	8
Single-person motion forecasting	CMU MOCAP	MPJPE (400ms)99	8
Multi-person motion prediction	Mix1 6 persons	JPE (0.2s)32	7
Multi-person motion prediction	Mix2 10 persons	JPE (0.2s)36	7
Multi-agent Pose Forecasting	CMU-Mocap UMPM (test)	JPE (0.2s)31.5	4

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord