Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Detecting Human-Object Interaction via Fabricated Compositional Learning

About

Human-Object Interaction (HOI) detection, inferring the relationships between human and objects from images/videos, is a fundamental task for high-level scene understanding. However, HOI detection usually suffers from the open long-tailed nature of interactions with objects, while human has extremely powerful compositional perception ability to cognize rare or unseen HOI samples. Inspired by this, we devise a novel HOI compositional learning framework, termed as Fabricated Compositional Learning (FCL), to address the problem of open long-tailed HOI detection. Specifically, we introduce an object fabricator to generate effective object representations, and then combine verbs and fabricated objects to compose new HOI samples. With the proposed object fabricator, we are able to generate large-scale HOI samples for rare and unseen categories to alleviate the open long-tailed issues in HOI detection. Extensive experiments on the most popular HOI detection dataset, HICO-DET, demonstrate the effectiveness of the proposed method for imbalanced HOI detection and significantly improve the state-of-the-art performance on rare and unseen HOI categories. Code is available at https://github.com/zhihou7/HOI-CL.

Zhi Hou, Baosheng Yu, Yu Qiao, Xiaojiang Peng, Dacheng Tao• 2021

Related benchmarks

TaskDatasetResultRank
Human-Object Interaction DetectionHICO-DET (test)
mAP (full)45.25
493
Human-Object Interaction DetectionV-COCO (test)
AP (Role, Scenario 1)52.4
270
Human-Object Interaction DetectionHICO-DET
mAP (Full)29.12
233
Human-Object Interaction DetectionHICO-DET Known Object (test)
mAP (Full)31.31
112
Human-Object Interaction DetectionHICO-DET (Rare First Unseen Combination (RF-UC))
mAP (Full)22.01
77
Human-Object Interaction DetectionHICO-DET Non-rare First Unseen Composition (NF-UC)
AP (Unseen)18.66
49
Human-Object Interaction DetectionHICO-DET (NF-UC)
mAP (Full)19.37
40
Predicate DetectionVisual Relation Detection (VRD) (All)--
40
Predicate DetectionVisual Relation Detection (VRD) Zero-shot--
34
Human-Object Interaction DetectionHICO-DET Zero-Shot
mAP (Default Unseen)18.66
33
Showing 10 of 33 rows

Other info

Code

Follow for update