HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining
About
Human-centric perceptions include a variety of vision tasks, which have widespread industrial applications, including surveillance, autonomous driving, and the metaverse. It is desirable to have a general pretrain model for versatile human-centric downstream tasks. This paper forges ahead along this path from the aspects of both benchmark and pretraining methods. Specifically, we propose a \textbf{HumanBench} based on existing datasets to comprehensively evaluate on the common ground the generalization abilities of different pretraining methods on 19 datasets from 6 diverse downstream tasks, including person ReID, pose estimation, human parsing, pedestrian attribute recognition, pedestrian detection, and crowd counting. To learn both coarse-grained and fine-grained knowledge in human bodies, we further propose a \textbf{P}rojector \textbf{A}ssis\textbf{T}ed \textbf{H}ierarchical pretraining method (\textbf{PATH}) to learn diverse knowledge at different granularity levels. Comprehensive evaluations on HumanBench show that our PATH achieves new state-of-the-art results on 17 downstream datasets and on-par results on the other 2 datasets. The code will be publicly at \href{https://github.com/OpenGVLab/HumanBench}{https://github.com/OpenGVLab/HumanBench}.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Person Re-Identification | Market1501 (test) | Rank-1 Accuracy95.8 | 1264 | |
| Person Re-Identification | Market 1501 | mAP91.8 | 999 | |
| Person Re-Identification | MSMT17 (test) | Rank-1 Acc84.3 | 499 | |
| Person Re-Identification | MSMT17 | mAP0.691 | 404 | |
| Pose Estimation | COCO (val) | AP77.1 | 319 | |
| Person Re-Identification | CUHK03 | -- | 184 | |
| Crowd Counting | ShanghaiTech Part B | -- | 160 | |
| Crowd Counting | ShanghaiTech Part A | -- | 138 | |
| Person Re-Identification | CUHK03 (test) | -- | 108 | |
| Pedestrian Attribute Recognition | PA-100K | mA90.8 | 79 |