Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Poseur: Direct Human Pose Regression with Transformers

About

We propose a direct, regression-based approach to 2D human pose estimation from single images. We formulate the problem as a sequence prediction task, which we solve using a Transformer network. This network directly learns a regression mapping from images to the keypoint coordinates, without resorting to intermediate representations such as heatmaps. This approach avoids much of the complexity associated with heatmap-based approaches. To overcome the feature misalignment issues of previous regression-based methods, we propose an attention mechanism that adaptively attends to the features that are most relevant to the target keypoints, considerably improving the accuracy. Importantly, our framework is end-to-end differentiable, and naturally learns to exploit the dependencies between keypoints. Experiments on MS-COCO and MPII, two predominant pose-estimation datasets, demonstrate that our method significantly improves upon the state-of-the-art in regression-based pose estimation. More notably, ours is the first regression-based approach to perform favorably compared to the best heatmap-based pose estimation methods.

Weian Mao, Yongtao Ge, Chunhua Shen, Zhi Tian, Xinlong Wang, Zhibin Wang, Anton van den Hengel• 2022

Related benchmarks

TaskDatasetResultRank
Human Pose EstimationCOCO (test-dev)
AP78.3
408
2D Human Pose EstimationCOCO 2017 (val)
AP76.8
386
Pose EstimationCOCO (val)
AP79.6
319
Whole-body Pose EstimationCOCO-Wholebody 1.0 (val)
Body AP68.5
64
2D Human Pose EstimationMPII (val)--
61
Human Pose EstimationPoseTrack 2017 (val)--
54
2D Occluded Pose EstimationSyncOCC 1.0 (test)
AP^OC93.1
10
2D Occluded Pose EstimationSyncOCC-H 1.0
AP^OC78.5
10
2D Occluded Pose EstimationOCHuman 1.0 (test)
AP^OC45.6
10
2D Occluded Pose EstimationOCHuman 1.0 (val)
AP^OC44.4
10
Showing 10 of 11 rows

Other info

Code

Follow for update