A generic diffusion-based approach for 3D human pose prediction in the wild

About

Predicting 3D human poses in real-world scenarios, also known as human pose forecasting, is inevitably subject to noisy inputs arising from inaccurate 3D pose estimations and occlusions. To address these challenges, we propose a diffusion-based approach that can predict given noisy observations. We frame the prediction task as a denoising problem, where both observation and prediction are considered as a single sequence containing missing elements (whether in the observation or prediction horizon). All missing elements are treated as noise and denoised with our conditional diffusion model. To better handle long-term forecasting horizon, we present a temporal cascaded diffusion model. We demonstrate the benefits of our approach on four publicly available datasets (Human3.6M, HumanEva-I, AMASS, and 3DPW), outperforming the state-of-the-art. Additionally, we show that our framework is generic enough to improve any 3D pose prediction model as a pre-processing step to repair their inputs and a post-processing step to refine their outputs. The code is available online: \url{https://github.com/vita-epfl/DePOSit}.

Saeed Saadatnejad, Ali Rasekh, Mohammadreza Mofayezi, Yasamin Medghalchi, Sara Rajabzadeh, Taylor Mordan, Alexandre Alahi• 2022

Related benchmarks

Task	Dataset	Result
3D Human Pose Prediction	Human3.6M Setting-A	ADE356	13
3D Human Pose Prediction	HumanEva I	ADE199	12
3D Human Pose Forecasting	Human3.6M (test)	ADE0.603	10
Human Pose Prediction	Human3.6M Setting-B	FDE (80ms)9.9	9
Human Motion Prediction	Human3.6M (Setting-C)	FDE (Random Leg, Arm Occlusions)77.5	6
Human Pose Prediction	AMASS 37 (long-term)	FDE (560ms)49.8	5
Human Pose Prediction	3DPW 56 (long-term)	FDE (560ms)55.4	5
Human Motion Prediction	Human3.6M (Setting-D)	ADE @ 80ms7.4	4
Human Motion Prediction	Human3.6M Setting-E (test)	Error (80ms)8.3	4

Showing 9 of 9 rows

Other info

Code

Follow for update

@wizwand_team Discord