OnlineHMR: Video-based Online World-Grounded Human Mesh Recovery
About
Human mesh recovery (HMR) models 3D human body from monocular videos, with recent works extending it to world-coordinate human trajectory and motion reconstruction. However, most existing methods remain offline, relying on future frames or global optimization, which limits their applicability in interactive feedback and perception-action loop scenarios such as AR/VR and telepresence. To address this, we propose OnlineHMR, a fully online framework that jointly satisfies four essential criteria of online processing, including system-level causality, faithfulness, temporal consistency, and efficiency. Built upon a two-branch architecture, OnlineHMR enables streaming inference via a causal key-value cache design and a curated sliding-window learning strategy. Meanwhile, a human-centric incremental SLAM provides online world-grounded alignment under physically plausible trajectory correction. Experimental results show that our method achieves performance comparable to existing chunk-based approaches on the standard EMDB benchmark and highly dynamic custom videos, while uniquely supporting online processing. Page and code are available at https://tsukasane.github.io/Video-OnlineHMR/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Human Mesh Recovery | 3DPW (test) | MPJPE69.9 | 299 | |
| Human global trajectory and motion reconstruction | EMDB 2 | WA-MPJPE10093.5 | 17 | |
| Camera-coordinate Human Mesh Recovery | EMDB-1 (test) | PA-MPJPE46 | 13 | |
| World-coordinate Human Mesh Recovery | EMDB-2 v1.0 (test) | FPS3.3 | 8 |