CoMotion: Concurrent Multi-person 3D Motion

About

We introduce an approach for detecting and tracking detailed 3D poses of multiple people from a single monocular camera stream. Our system maintains temporally coherent predictions in crowded scenes filled with difficult poses and occlusions. Our model performs both strong per-frame detection and a learned pose update to track people from frame to frame. Rather than match detections across time, poses are updated directly from a new input image, which enables online tracking through occlusion. We train on numerous image and video datasets leveraging pseudo-labeled annotations to produce a model that matches state-of-the-art systems in 3D pose estimation accuracy while being faster and more accurate in tracking multiple people through time. Code and weights are provided at https://github.com/apple/ml-comotion

Alejandro Newell, Peiyun Hu, Lahav Lipson, Stephan R. Richter, Vladlen Koltun• 2025

Related benchmarks

Task	Dataset	Result
Human Mesh Recovery	3DPW	PA-MPJPE37.3	159
Human Mesh Recovery	RICH	--	19
Human Mesh Recovery	EMDB	MPJPE73.5	16
3D Pose Estimation	3DPW (test val)	MPJPE60	8
2D Pose Estimation	PoseTrack (test val)	PCK@0.0588	8
2D Pose Estimation	COCO (test val)	PCK@0.0579	8
Human Mesh Recovery	Sim-Geometry	PA-PVE87.9	6
Human Mesh Recovery	Sim-Visual	IoU38	6
Multiple Object Tracking	PoseTrack18	HOTA58.2	5
Multiple Object Tracking	PoseTrack 21	MOTA71.4	4

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord