Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild
About
We introduce a simple and effective network architecture for monocular 3D hand pose estimation consisting of an image encoder followed by a mesh convolutional decoder that is trained through a direct 3D hand mesh reconstruction loss. We train our network by gathering a large-scale dataset of hand action in YouTube videos and use it as a source of weak supervision. Our weakly-supervised mesh convolutions-based system largely outperforms state-of-the-art methods, even halving the errors on the in the wild benchmark. The dataset and additional resources are available at https://arielai.com/mesh_hands.
Dominik Kulon, Riza Alp G\"uler, Iasonas Kokkinos, Michael Bronstein, Stefanos Zafeiriou• 2020
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Hand Reconstruction | FreiHAND (test) | F@15mm96.6 | 148 | |
| Hand Reconstruction | InterHand 2.6M (test) | MPJPE9.95 | 29 | |
| 3D Hand Reconstruction | DexYCB (test) | MPVPE9.39 | 28 | |
| 3D Hand Pose Estimation | RHD (val) | AUC (PCK)95 | 6 | |
| 3D Hand Pose Estimation | MPII+NZSL (val) | AUC (PCK)0.701 | 4 |
Showing 5 of 5 rows