CoMotion: Concurrent Multi-person 3D Motion
About
We introduce an approach for detecting and tracking detailed 3D poses of multiple people from a single monocular camera stream. Our system maintains temporally coherent predictions in crowded scenes filled with difficult poses and occlusions. Our model performs both strong per-frame detection and a learned pose update to track people from frame to frame. Rather than match detections across time, poses are updated directly from a new input image, which enables online tracking through occlusion. We train on numerous image and video datasets leveraging pseudo-labeled annotations to produce a model that matches state-of-the-art systems in 3D pose estimation accuracy while being faster and more accurate in tracking multiple people through time. Code and weights are provided at https://github.com/apple/ml-comotion
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Human Mesh Recovery | 3DPW | PA-MPJPE37.3 | 140 | |
| Human Mesh Recovery | EMDB | MPJPE73.5 | 16 | |
| Human Mesh Recovery | RICH | PA-MPVPE128.7 | 13 | |
| 3D Pose Estimation | 3DPW (test val) | MPJPE60 | 8 | |
| 2D Pose Estimation | PoseTrack (test val) | PCK@0.0588 | 8 | |
| 2D Pose Estimation | COCO (test val) | PCK@0.0579 | 8 | |
| Human Mesh Recovery | Sim-Geometry | PA-PVE87.9 | 6 | |
| Human Mesh Recovery | Sim-Visual | IoU38 | 6 | |
| Multiple Object Tracking | PoseTrack18 | HOTA58.2 | 5 | |
| Multiple Object Tracking | PoseTrack 21 | MOTA71.4 | 4 |