Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression

About

In this paper, we are interested in the bottom-up paradigm of estimating human poses from an image. We study the dense keypoint regression framework that is previously inferior to the keypoint detection and grouping framework. Our motivation is that regressing keypoint positions accurately needs to learn representations that focus on the keypoint regions. We present a simple yet effective approach, named disentangled keypoint regression (DEKR). We adopt adaptive convolutions through pixel-wise spatial transformer to activate the pixels in the keypoint regions and accordingly learn representations from them. We use a multi-branch structure for separate regression: each branch learns a representation with dedicated adaptive convolutions and regresses one keypoint. The resulting disentangled representations are able to attend to the keypoint regions, respectively, and thus the keypoint regression is spatially more accurate. We empirically show that the proposed direct regression method outperforms keypoint detection and grouping methods and achieves superior bottom-up pose estimation results on two benchmark datasets, COCO and CrowdPose. The code and models are available at https://github.com/HRNet/DEKR.

Zigang Geng, Ke Sun, Bin Xiao, Zhaoxiang Zhang, Jingdong Wang• 2021

Related benchmarks

Task	Dataset	Result
Human Pose Estimation	COCO (test-dev)	AP71	432
2D Human Pose Estimation	COCO 2017 (val)	AP72.1	386
Pose Estimation	COCO (val)	AP72.3	319
Multi-person Pose Estimation	CrowdPose (test)	AP68	202
Human Pose Estimation	COCO 2017 (test-dev)	AP71	180
Multi-person Pose Estimation	COCO (test-dev)	AP71	101
Pose Estimation	OCHuman (test)	AP38.2	95
Pose Estimation	COCO 2017 (val)	AP70	71
Multi-person Pose Estimation	OCHuman (val)	AP38.8	40
Pose Estimation	OCHuman (val)	AP37.9	24

Showing 10 of 20 rows

Other info

Code

Follow for update

@wizwand_team Discord