Associative Embedding: End-to-End Learning for Joint Detection and Grouping

About

We introduce associative embedding, a novel method for supervising convolutional neural networks for the task of detection and grouping. A number of computer vision problems can be framed in this manner including multi-person pose estimation, instance segmentation, and multi-object tracking. Usually the grouping of detections is achieved with multi-stage pipelines, instead we propose an approach that teaches a network to simultaneously output detections and group assignments. This technique can be easily integrated into any state-of-the-art network architecture that produces pixel-wise predictions. We show how to apply this method to both multi-person pose estimation and instance segmentation and report state-of-the-art performance for multi-person pose on the MPII and MS-COCO datasets.

Alejandro Newell, Zhiao Huang, Jia Deng• 2016

Related benchmarks

Task	Dataset	Result
Human Pose Estimation	COCO (test-dev)	AP65.5	432
2D Human Pose Estimation	COCO 2017 (val)	AP69.9	386
Human Pose Estimation	MPII (test)	Shoulder PCK89.3	350
Human Pose Estimation	COCO 2017 (test-dev)	AP68.4	180
Instance Segmentation	PASCAL VOC 2012 (val)	mAP @0.535.1	173
Multi-person Pose Estimation	COCO (test-dev)	AP65.5	101
Multi-person Pose Estimation	COCO 2017 (test-dev)	AP65.5	99
Pose Estimation	OCHuman (test)	AP34.8	95
Pose Estimation	COCO 2017 (val)	AP61.3	71
Whole-body Pose Estimation	COCO-Wholebody 1.0 (val)	Body AP58	64

Showing 10 of 22 rows

Other info

Follow for update

@wizwand_team Discord