Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery

About

Transformer architectures have achieved SOTA performance on the human mesh recovery (HMR) from monocular images. However, the performance gain has come at the cost of substantial memory and computational overhead. A lightweight and efficient model to reconstruct accurate human mesh is needed for real-world applications. In this paper, we propose a pure transformer architecture named POoling aTtention TransformER (POTTER) for the HMR task from single images. Observing that the conventional attention module is memory and computationally expensive, we propose an efficient pooling attention module, which significantly reduces the memory and computational cost without sacrificing performance. Furthermore, we design a new transformer architecture by integrating a High-Resolution (HR) stream for the HMR task. The high-resolution local and global features from the HR stream can be utilized for recovering more accurate human mesh. Our POTTER outperforms the SOTA method METRO by only requiring 7% of total parameters and 14% of the Multiply-Accumulate Operations on the Human3.6M (PA-MPJPE metric) and 3DPW (all three metrics) datasets. The project webpage is https://zczcwh.github.io/potter_page.

Ce Zheng, Xianpeng Liu, Guo-Jun Qi, Chen Chen• 2023

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet-1K 1.0 (val)
Top-1 Accuracy81.4
1866
3D Human Mesh Recovery3DPW (test)
PA-MPJPE44.8
264
Human Mesh Recovery3DPW
PA-MPJPE44.8
123
3D Human Mesh RecoveryHuman3.6M (test)
PA-MPJPE35.1
120
3D Human Mesh Recovery3DPW
PA-MPJPE44.8
72
3D Body Mesh RecoveryHuman3.6M
PA-MPJPE35.1
46
9-class classificationPathMNIST
Accuracy91.45
32
ClassificationRetinaMNIST
ACC63.54
24
ClassificationPneumoniaMNIST
Accuracy89.89
24
ClassificationOrganAMNIST
Accuracy96.06
14
Showing 10 of 24 rows

Other info

Code

Follow for update