HMP: Hand Motion Priors for Pose and Shape Estimation from Video

About

Understanding how humans interact with the world necessitates accurate 3D hand pose estimation, a task complicated by the hand's high degree of articulation, frequent occlusions, self-occlusions, and rapid motions. While most existing methods rely on single-image inputs, videos have useful cues to address aforementioned issues. However, existing video-based 3D hand datasets are insufficient for training feedforward models to generalize to in-the-wild scenarios. On the other hand, we have access to large human motion capture datasets which also include hand motions, e.g. AMASS. Therefore, we develop a generative motion prior specific for hands, trained on the AMASS dataset which features diverse and high-quality hand motions. This motion prior is then employed for video-based 3D hand motion estimation following a latent optimization approach. Our integration of a robust motion prior significantly enhances performance, especially in occluded scenarios. It produces stable, temporally consistent results that surpass conventional single-frame methods. We demonstrate our method's efficacy via qualitative and quantitative evaluations on the HO3D and DexYCB datasets, with special emphasis on an occlusion-focused subset of HO3D. Code is available at https://hmp.is.tue.mpg.de

Enes Duran, Muhammed Kocabas, Vasileios Choutas, Zicong Fan, Michael J. Black• 2023

Related benchmarks

Task	Dataset	Result
3D Hand Reconstruction	HO3D v3	PA-MPJPE10.1	25
3D Mesh Reconstruction	HO3D v3	PA-MPJPE10.1	9
World-space hand motion estimation	HOT3D 2	PA-MPJPE11.72	7
3D Hand Pose Estimation	HO3D v3 (train)	PA-MPJPE10.1	6
World-space hand motion estimation	ARCTIC 6 (test)	PA-MPJPE13.73	6
3D Hand Pose Estimation	DexYCB 3 (S0)	PA-MPJPE8.9	3

Showing 6 of 6 rows

Other info

Code

Follow for update

@wizwand_team Discord