Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CoMotion: Concurrent Multi-person 3D Motion

About

We introduce an approach for detecting and tracking detailed 3D poses of multiple people from a single monocular camera stream. Our system maintains temporally coherent predictions in crowded scenes filled with difficult poses and occlusions. Our model performs both strong per-frame detection and a learned pose update to track people from frame to frame. Rather than match detections across time, poses are updated directly from a new input image, which enables online tracking through occlusion. We train on numerous image and video datasets leveraging pseudo-labeled annotations to produce a model that matches state-of-the-art systems in 3D pose estimation accuracy while being faster and more accurate in tracking multiple people through time. Code and weights are provided at https://github.com/apple/ml-comotion

Alejandro Newell, Peiyun Hu, Lahav Lipson, Stephan R. Richter, Vladlen Koltun• 2025

Related benchmarks

TaskDatasetResultRank
Human Mesh Recovery3DPW
PA-MPJPE37.3
140
Human Mesh RecoveryEMDB
MPJPE73.5
16
Human Mesh RecoveryRICH
PA-MPVPE128.7
13
3D Pose Estimation3DPW (test val)
MPJPE60
8
2D Pose EstimationPoseTrack (test val)
PCK@0.0588
8
2D Pose EstimationCOCO (test val)
PCK@0.0579
8
Human Mesh RecoverySim-Geometry
PA-PVE87.9
6
Human Mesh RecoverySim-Visual
IoU38
6
Multiple Object TrackingPoseTrack18
HOTA58.2
5
Multiple Object TrackingPoseTrack 21
MOTA71.4
4
Showing 10 of 12 rows

Other info

Follow for update