Model-based 3D Hand Reconstruction via Self-Supervised Learning

About

Reconstructing a 3D hand from a single-view RGB image is challenging due to various hand configurations and depth ambiguity. To reliably reconstruct a 3D hand from a monocular image, most state-of-the-art methods heavily rely on 3D annotations at the training stage, but obtaining 3D annotations is expensive. To alleviate reliance on labeled training data, we propose S2HAND, a self-supervised 3D hand reconstruction network that can jointly estimate pose, shape, texture, and the camera viewpoint. Specifically, we obtain geometric cues from the input image through easily accessible 2D detected keypoints. To learn an accurate hand reconstruction model from these noisy geometric cues, we utilize the consistency between 2D and 3D representations and propose a set of novel losses to rationalize outputs of the neural network. For the first time, we demonstrate the feasibility of training an accurate 3D hand reconstruction network without relying on manual annotations. Our experiments show that the proposed method achieves comparable performance with recent fully-supervised methods while using fewer supervision data.

Yujin Chen, Zhigang Tu, Di Kang, Linchao Bao, Ying Zhang, Xuefei Zhe, Ruizhi Chen, Junsong Yuan• 2021

Related benchmarks

Task	Dataset	Result
Hand Mesh Reconstruction	HO3D v2 (test)	F@50.44	44
3D Hand Reconstruction	HO3D v3	PA-MPJPE11.5	25
3D Hand-Object Interaction	HO3D v2 (test)	PA-MPJPE11.4	20
Hand Reconstruction	HO3D v3 (test)	MPJPE11.5	14
Hand Avatar Rendering	InterHand2.6M Capture0 (val)	LPIPS0.146	13
Novel View Synthesis	InterHand2.6M (test)	LPIPS0.1512	12
3D Mesh Reconstruction	HO3D v3	PA-MPJPE11.5	9
Appearance reconstruction	InterHand2.6M (test)	L1 Loss0.0206	8
Appearance reconstruction	RGB2Hands	L1 Loss0.0179	4
Novel Pose Reconstruction	InterHand 2.6M (test)	L1 Error0.028	4

Showing 10 of 12 rows

Other info

Code

Follow for update

@wizwand_team Discord