Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

The Overlooked Classifier in Human-Object Interaction Recognition

About

Human-Object Interaction (HOI) recognition is challenging due to two factors: (1) significant imbalance across classes and (2) requiring multiple labels per image. This paper shows that these two challenges can be effectively addressed by improving the classifier with the backbone architecture untouched. Firstly, we encode the semantic correlation among classes into the classification head by initializing the weights with language embeddings of HOIs. As a result, the performance is boosted significantly, especially for the few-shot subset. Secondly, we propose a new loss named LSE-Sign to enhance multi-label learning on a long-tailed dataset. Our simple yet effective method enables detection-free HOI classification, outperforming the state-of-the-arts that require object detection and human pose by a clear margin. Moreover, we transfer the classification model to instance-level HOI detection by connecting it with an off-the-shelf object detector. We achieve state-of-the-art without additional fine-tuning.

Ying Jin, Yinpeng Chen, Lijuan Wang, Jianfeng Wang, Pei Yu, Lin Liang, Jenq-Neng Hwang, Zicheng Liu• 2021

Related benchmarks

TaskDatasetResultRank
Human-Object Interaction DetectionHICO-DET
mAP (Full)35.36
233
HOI ClassificationHICO (test)
mAP65.6
10
Human-Object Interaction RecognitionMPII (val)
mAP55.3
5
HOI ClassificationHICO Full (test)
mAP65.6
4
HOI ClassificationHICO Few-shot Few@1 (test)
mAP52.7
4
HOI ClassificationHICO Few@5 Few-shot (test)
mAP56.9
4
HOI ClassificationHICO Few-shot Few@10 (test)
mAP57.2
4
Showing 7 of 7 rows

Other info

Follow for update