OmniHands: Towards Robust 4D Hand Mesh Recovery via A Versatile Transformer
About
In this paper, we introduce OmniHands, a universal approach to recovering interactive hand meshes and their relative movement from monocular or multi-view inputs. Our approach addresses two major limitations of previous methods: lacking a unified solution for handling various hand image inputs and neglecting the positional relationship of two hands within images. To overcome these challenges, we develop a universal architecture with novel tokenization and contextual feature fusion strategies, capable of adapting to a variety of tasks. Specifically, we propose a Relation-aware Two-Hand Tokenization (RAT) method to embed positional relation information into the hand tokens. In this way, our network can handle both single-hand and two-hand inputs and explicitly leverage relative hand positions, facilitating the reconstruction of intricate hand interactions in real-world scenarios. As such tokenization indicates the relative relationship of two hands, it also supports more effective feature fusion. To this end, we further develop a 4D Interaction Reasoning (FIR) module to fuse hand tokens in 4D with attention and decode them into 3D hand meshes and relative temporal movements. The efficacy of our approach is validated on several benchmark datasets. The results on in-the-wild videos and real-world scenarios demonstrate the superior performances of our approach for interactive hand reconstruction. More video results can be found on the project page: https://OmniHand.github.io.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Hand Reconstruction | InterHand 2.6M (test) | -- | 29 | |
| 3D Hand Reconstruction | FreiHAND | PA MPVPE5.7 | 25 | |
| 3D Hand-Object Reconstruction | HO3D v2 | MPJPE6.9 | 16 | |
| 3D Hand Reconstruction | ReIH (test) | MRRPE (Mean Relative Error)42.26 | 13 | |
| Interacting Hand Reconstruction | Interhand2.6M 30fps v1.0 | Acceleration Error (Accel_E)2.81 | 10 | |
| 3D Hand Pose and Shape Estimation | InterHand 5fps 2.6M (test) | MPJPE7.49 | 10 | |
| Interacting Hand Reconstruction | Interhand2.6m 5fps (test) | MPJPE5.93 | 9 | |
| 3D Bimanual Hand-Object Reconstruction | ARCTIC (test) | -- | 8 | |
| Single hand temporal reconstruction | Dex-YCB (test) | Ra-MAJPE8.37 | 7 | |
| Single hand recovery | Dex-YCB | Ra-MPJPE6.44 | 6 |