Associative Embedding: End-to-End Learning for Joint Detection and Grouping
About
We introduce associative embedding, a novel method for supervising convolutional neural networks for the task of detection and grouping. A number of computer vision problems can be framed in this manner including multi-person pose estimation, instance segmentation, and multi-object tracking. Usually the grouping of detections is achieved with multi-stage pipelines, instead we propose an approach that teaches a network to simultaneously output detections and group assignments. This technique can be easily integrated into any state-of-the-art network architecture that produces pixel-wise predictions. We show how to apply this method to both multi-person pose estimation and instance segmentation and report state-of-the-art performance for multi-person pose on the MPII and MS-COCO datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Human Pose Estimation | COCO (test-dev) | AP65.5 | 408 | |
| 2D Human Pose Estimation | COCO 2017 (val) | AP69.9 | 386 | |
| Human Pose Estimation | MPII (test) | Shoulder PCK89.3 | 314 | |
| Human Pose Estimation | COCO 2017 (test-dev) | AP68.4 | 180 | |
| Instance Segmentation | PASCAL VOC 2012 (val) | mAP @0.535.1 | 173 | |
| Multi-person Pose Estimation | COCO (test-dev) | AP65.5 | 101 | |
| Multi-person Pose Estimation | COCO 2017 (test-dev) | AP65.5 | 99 | |
| Pose Estimation | OCHuman (test) | AP34.8 | 95 | |
| Whole-body Pose Estimation | COCO-Wholebody 1.0 (val) | Body AP58 | 64 | |
| Human Pose Estimation | COCO (val) | AP61.3 | 53 |