Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Human Pose as Compositional Tokens

About

Human pose is typically represented by a coordinate vector of body joints or their heatmap embeddings. While easy for data processing, unrealistic pose estimates are admitted due to the lack of dependency modeling between the body joints. In this paper, we present a structured representation, named Pose as Compositional Tokens (PCT), to explore the joint dependency. It represents a pose by M discrete tokens with each characterizing a sub-structure with several interdependent joints. The compositional design enables it to achieve a small reconstruction error at a low cost. Then we cast pose estimation as a classification task. In particular, we learn a classifier to predict the categories of the M tokens from an image. A pre-learned decoder network is used to recover the pose from the tokens without further post-processing. We show that it achieves better or comparable pose estimation results as the existing methods in general scenarios, yet continues to work well when occlusion occurs, which is ubiquitous in practice. The code and models are publicly available at https://github.com/Gengzigang/PCT.

Zigang Geng, Chunyu Wang, Yixuan Wei, Ze Liu, Houqiang Li, Han Hu• 2023

Related benchmarks

TaskDatasetResultRank
3D Human Pose EstimationHuman3.6M (test)--
547
2D Human Pose EstimationCOCO 2017 (val)
AP79.3
386
Human Pose EstimationMPII (test)
Shoulder PCK97.8
314
Human Pose EstimationCOCO 2017 (test-dev)
AP78.3
180
2D Human Pose EstimationMPII (val)
Head97.5
61
3D Human Pose Estimation3DPW cross-dataset (test)
PA-MPJPE53.9
27
Pose EstimationCOCO 2017 (val)
AP79.3
23
2D Occluded Pose EstimationOCHuman 1.0 (val)
AP^OC50.8
10
2D Occluded Pose EstimationOCHuman 1.0 (test)
AP^OC49.6
10
2D Occluded Pose EstimationCrowdPose 1.0 (test)
AP^OC77.2
10
Showing 10 of 12 rows

Other info

Code

Follow for update