PAD-Hand: Physics-Aware Diffusion for Hand Motion Recovery
About
Significant advancements made in reconstructing hands from images have delivered accurate single-frame estimates, yet they often lack physics consistency and provide no notion of how confidently the motion satisfies physics. In this paper, we propose a novel physics-aware conditional diffusion framework that refines noisy pose sequences into physically plausible hand motion while estimating the physics variance in motion estimates. Building on a MeshCNN-Transformer backbone, we formulate Euler-Lagrange dynamics for articulated hands. Unlike prior works that enforce zero residuals, we treat the resulting dynamic residuals as virtual observables to more effectively integrate physics. Through a last-layer Laplace approximation, our method produces per-joint, per-time variances that measure physics consistency and offers interpretable variance maps indicating where physical consistency weakens. Experiments on two well-known hand datasets show consistent gains over strong image-based initializations and competitive video-based methods. Qualitative results confirm that our variance estimations are aligned with the physical plausibility of the motion in image-based estimates.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Hand Reconstruction | DexYCB (official evaluation) | MPJPE10.56 | 8 | |
| 3D Hand Reconstruction | HO3D (official evaluation) | PA-MPJPE7.43 | 7 | |
| Hand Motion Recovery | HO3D | PA-MPJPE7.43 | 3 | |
| Hand Motion Recovery | DexYCB | PA-MPJPE4.63 | 3 | |
| Hand Pose Estimation | TACO S1 (test) | PA-MPJPE8.02 | 3 |