Poseur: Direct Human Pose Regression with Transformers

About

We propose a direct, regression-based approach to 2D human pose estimation from single images. We formulate the problem as a sequence prediction task, which we solve using a Transformer network. This network directly learns a regression mapping from images to the keypoint coordinates, without resorting to intermediate representations such as heatmaps. This approach avoids much of the complexity associated with heatmap-based approaches. To overcome the feature misalignment issues of previous regression-based methods, we propose an attention mechanism that adaptively attends to the features that are most relevant to the target keypoints, considerably improving the accuracy. Importantly, our framework is end-to-end differentiable, and naturally learns to exploit the dependencies between keypoints. Experiments on MS-COCO and MPII, two predominant pose-estimation datasets, demonstrate that our method significantly improves upon the state-of-the-art in regression-based pose estimation. More notably, ours is the first regression-based approach to perform favorably compared to the best heatmap-based pose estimation methods.

Weian Mao, Yongtao Ge, Chunhua Shen, Zhi Tian, Xinlong Wang, Zhibin Wang, Anton van den Hengel• 2022

Related benchmarks

Task	Dataset	Result
Human Pose Estimation	COCO (test-dev)	AP78.3	432
2D Human Pose Estimation	COCO 2017 (val)	AP76.8	386
Pose Estimation	COCO (val)	AP79.6	319
Human Pose Estimation	PoseTrack 2017 (val)	--	75
Whole-body Pose Estimation	COCO-Wholebody 1.0 (val)	Body AP68.5	64
2D Human Pose Estimation	MPII (val)	--	61
2D Occluded Pose Estimation	SyncOCC 1.0 (test)	AP^OC93.1	10
2D Occluded Pose Estimation	SyncOCC-H 1.0	AP^OC78.5	10
2D Occluded Pose Estimation	OCHuman 1.0 (test)	AP^OC45.6	10
2D Occluded Pose Estimation	OCHuman 1.0 (val)	AP^OC44.4	10

Showing 10 of 11 rows

Other info

Code

Follow for update

@wizwand_team Discord