Human3R: Everyone Everywhere All at Once
About
We present Human3R, a unified, feed-forward framework for online 4D human-scene reconstruction, in the world frame, from casually captured monocular videos. Unlike previous approaches that rely on multi-stage pipelines, iterative contact-aware refinement between humans and scenes, and heavy dependencies, e.g., human detection, depth estimation, and SLAM pre-processing, Human3R jointly recovers global multi-person SMPL-X bodies ("everyone"), dense 3D scene ("everywhere"), and camera trajectories in a single forward pass ("all-at-once"). Our method builds upon the 4D online reconstruction model CUT3R, and uses parameter-efficient visual prompt tuning, to strive to preserve CUT3R's rich spatiotemporal priors, while enabling direct readout of multiple SMPL-X bodies. Human3R is a unified model that eliminates heavy dependencies and iterative refinement. After being trained on the relatively small-scale synthetic dataset BEDLAM for just one day on one GPU, it achieves superior performance with remarkable efficiency: it reconstructs multiple humans in a one-shot manner, along with 3D scenes, in one stage, in real-time (15 FPS) with a low memory footprint (8 GB). Extensive experiments demonstrate that Human3R delivers state-of-the-art or competitive performance across tasks, including global human motion estimation, local human mesh recovery, video depth estimation, and camera pose estimation, with a single unified model. We hope that Human3R will serve as a simple yet strong baseline, which can be easily adapted for downstream applications. Code, models and 4D interactive demos are available at https://fanegg.github.io/Human3R/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Human Mesh Recovery | 3DPW (test) | MPJPE71.2 | 299 | |
| 3D Human Pose Estimation | 3DPW | PA-MPJPE44.1 | 127 | |
| Human Mesh Recovery | MPI-INF-3DHP | MPJPE106.4 | 35 | |
| Human Mesh Reconstruction | EMDB 24 joints (test) | PA-MPJPE48.5 | 30 | |
| Global human motion estimation | RICH | WA-MPJPE110 | 21 | |
| Human global trajectory and motion reconstruction | EMDB 2 | WA-MPJPE100112.2 | 17 | |
| Human Mesh Recovery | MoYo | MPJPE149.7 | 16 | |
| Global motion and trajectory estimation | EMDB 2 | WA-MPJPE112.2 | 15 | |
| Camera-coordinate Human Mesh Recovery | EMDB-1 (test) | PA-MPJPE48.5 | 13 | |
| Human Mesh Recovery | RICH | PA-MPVPE56.3 | 13 |