RAM: Recover Any 3D Human Motion in-the-Wild
About
RAM incorporates a motion-aware semantic tracker with adaptive Kalman filtering to achieve robust identity association under severe occlusions and dynamic interactions. A memory-augmented Temporal HMR module further enhances human motion reconstruction by injecting spatio-temporal priors for consistent and smooth motion estimation. Moreover, a lightweight Predictor module forecasts future poses to maintain reconstruction continuity, while a gated combiner adaptively fuses reconstructed and predicted features to ensure coherence and robustness. Experiments on in-the-wild multi-person benchmarks such as PoseTrack and 3DPW, demonstrate that RAM substantially outperforms previous state-of-the-art in both Zero-shot tracking stability and 3D accuracy, offering a generalizable paradigm for markerless 3D human motion capture in-the-wild.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 2D Pose Estimation | COCO (test val) | PCK@0.0589 | 8 | |
| 2D Pose Estimation | PoseTrack (test val) | PCK@0.0593 | 8 | |
| 3D Pose Estimation | 3DPW (test val) | MPJPE53 | 8 | |
| Multiple Object Tracking | PoseTrack18 | HOTA66.4 | 5 | |
| human motion tracking | TrackID3x3 Indoor 1.0 | TI-HOTA75.07 | 4 | |
| human motion tracking | TrackID3x3 Outdoor 1.0 | TI-HOTA66.68 | 4 | |
| Multiple Object Tracking | PoseTrack 21 | MOTA74.4 | 4 |