Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PaStaNet: Toward Human Activity Knowledge Engine

About

Existing image-based activity understanding methods mainly adopt direct mapping, i.e. from image to activity concepts, which may encounter performance bottleneck since the huge gap. In light of this, we propose a new path: infer human part states first and then reason out the activities based on part-level semantics. Human Body Part States (PaSta) are fine-grained action semantic tokens, e.g. <hand, hold, something>, which can compose the activities and help us step toward human activity knowledge engine. To fully utilize the power of PaSta, we build a large-scale knowledge base PaStaNet, which contains 7M+ PaSta annotations. And two corresponding models are proposed: first, we design a model named Activity2Vec to extract PaSta features, which aim to be general representations for various activities. Second, we use a PaSta-based Reasoning method to infer activities. Promoted by PaStaNet, our method achieves significant improvements, e.g. 6.4 and 13.9 mAP on full and one-shot sets of HICO in supervised learning, and 3.2 and 4.2 mAP on V-COCO and images-based AVA in transfer learning. Code and data are available at http://hake-mvig.cn/.

Yong-Lu Li, Liang Xu, Xinpeng Liu, Xijie Huang, Yue Xu, Shiyi Wang, Hao-Shu Fang, Ze Ma, Mingyang Chen, Cewu Lu• 2020

Related benchmarks

TaskDatasetResultRank
Human-Object Interaction DetectionHICO-DET (test)
mAP (full)34.86
493
Human-Object Interaction DetectionV-COCO (test)
AP (Role, Scenario 1)51
270
Human-Object Interaction DetectionHICO-DET
mAP (Full)22.65
233
HOI ClassificationHICO (test)
mAP46.3
10
Instance Activity DetectionAVA image-based (val)
mAP24.3
6
HOI ClassificationHICO Full (test)
mAP46.3
4
HOI ClassificationHICO Few-shot Few@1 (test)
mAP24.7
4
HOI ClassificationHICO Few@5 Few-shot (test)
mAP31.8
4
HOI ClassificationHICO Few-shot Few@10 (test)
mAP33.1
4
Showing 9 of 9 rows

Other info

Code

Follow for update