RAM: Recover Any 3D Human Motion in-the-Wild

About

RAM incorporates a motion-aware semantic tracker with adaptive Kalman filtering to achieve robust identity association under severe occlusions and dynamic interactions. A memory-augmented Temporal HMR module further enhances human motion reconstruction by injecting spatio-temporal priors for consistent and smooth motion estimation. Moreover, a lightweight Predictor module forecasts future poses to maintain reconstruction continuity, while a gated combiner adaptively fuses reconstructed and predicted features to ensure coherence and robustness. Experiments on in-the-wild multi-person benchmarks such as PoseTrack and 3DPW, demonstrate that RAM substantially outperforms previous state-of-the-art in both Zero-shot tracking stability and 3D accuracy, offering a generalizable paradigm for markerless 3D human motion capture in-the-wild.

Sen Jia, Ning Zhu, Jinqin Zhong, Jiale Zhou, Huaping Zhang, Jenq-Neng Hwang, Lei Li• 2026

Related benchmarks

Task	Dataset	Result
2D Pose Estimation	COCO (test val)	PCK@0.0589	8
2D Pose Estimation	PoseTrack (test val)	PCK@0.0593	8
3D Pose Estimation	3DPW (test val)	MPJPE53	8
Multiple Object Tracking	PoseTrack18	HOTA66.4	5
human motion tracking	TrackID3x3 Indoor 1.0	TI-HOTA75.07	4
human motion tracking	TrackID3x3 Outdoor 1.0	TI-HOTA66.68	4
Multiple Object Tracking	PoseTrack 21	MOTA74.4	4

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord