TokenPose: Learning Keypoint Tokens for Human Pose Estimation

About

Human pose estimation deeply relies on visual clues and anatomical constraints between parts to locate keypoints. Most existing CNN-based methods do well in visual representation, however, lacking in the ability to explicitly learn the constraint relationships between keypoints. In this paper, we propose a novel approach based on Token representation for human Pose estimation~(TokenPose). In detail, each keypoint is explicitly embedded as a token to simultaneously learn constraint relationships and appearance cues from images. Extensive experiments show that the small and large TokenPose models are on par with state-of-the-art CNN-based counterparts while being more lightweight. Specifically, our TokenPose-S and TokenPose-L achieve $72.5$ AP and $75.8$ AP on COCO validation dataset respectively, with significant reduction in parameters ($\downarrow80.6\%$; $\downarrow$ $56.8\%$) and GFLOPs ($\downarrow$ $75.3\%$; $\downarrow$ $24.7\%$). Code is publicly available.

Yanjie Li, Shoukui Zhang, Zhicheng Wang, Sen Yang, Wankou Yang, Shu-Tao Xia, Erjin Zhou• 2021

Related benchmarks

Task	Dataset	Result
Human Pose Estimation	COCO (test-dev)	AP75.9	432
2D Human Pose Estimation	COCO 2017 (val)	AP75.8	386
Pose Estimation	COCO (val)	AP75.9	319
Human Pose Estimation	COCO 2017 (test-dev)	AP75.9	180
2D Human Pose Estimation	MPII (val)	Head97.1	61
Keypoint Detection	COCO (val)	AP75.8	60
Pose Estimation	COCO	mAP75.8	30
Human Pose Estimation	COCO 2014 (val)	AP75.8	18
Animal Pose Estimation	AP-10K (val)	AP72.7	17
Human Pose Estimation	infant pose estimation dataset (test)	AP93	6

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord