HARP: Personalized Hand Reconstruction from a Monocular RGB Video
About
We present HARP (HAnd Reconstruction and Personalization), a personalized hand avatar creation approach that takes a short monocular RGB video of a human hand as input and reconstructs a faithful hand avatar exhibiting a high-fidelity appearance and geometry. In contrast to the major trend of neural implicit representations, HARP models a hand with a mesh-based parametric hand model, a vertex displacement map, a normal map, and an albedo without any neural components. As validated by our experiments, the explicit nature of our representation enables a truly scalable, robust, and efficient approach to hand avatar creation. HARP is optimized via gradient descent from a short sequence captured by a hand-held mobile phone and can be directly used in AR/VR applications with real-time rendering capability. To enable this, we carefully design and implement a shadow-aware differentiable rendering scheme that is robust to high degree articulations and self-shadowing regularly present in hand motion sequences, as well as challenging lighting conditions. It also generalizes to unseen poses and novel viewpoints, producing photo-realistic renderings of hand animations performing highly-articulated motions. Furthermore, the learned HARP representation can be used for improving 3D hand pose estimation quality in challenging viewpoints. The key advantages of HARP are validated by the in-depth analyses on appearance reconstruction, novel-view and novel pose synthesis, and 3D hand pose refinement. It is an AR/VR-ready personalized hand representation that shows superior fidelity and scalability.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Novel View Synthesis | InterHand2.6M (test) | LPIPS0.1367 | 12 | |
| Appearance reconstruction | InterHand2.6M (test) | L1 Loss0.0157 | 8 | |
| Appearance reconstruction | RGB2Hands | L1 Loss0.0155 | 4 | |
| Novel Pose Reconstruction | InterHand 2.6M (test) | L1 Error0.0256 | 4 | |
| Novel Poses | RGB2Hands | L1 Loss0.0208 | 4 | |
| 3D Hand Avatar Reconstruction | HARP subject_1 (sequences 6-9) (test) | PSNR27.5 | 3 | |
| 3D Hand Avatar Reconstruction | Phone scan dataset (test) | PSNR29.89 | 3 | |
| Contact estimation | MANUS-Grasps (Subject1) | mIoU0.173 | 3 | |
| Contact estimation | MANUS-Grasps (Subject2) | mIoU14.8 | 3 | |
| Contact estimation | MANUS-Grasps (Subject3) | mIoU0.224 | 3 |