Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

About

Human-Object Interaction (HOI) detection is an essential task to understand human-centric images from a fine-grained perspective. Although end-to-end HOI detection models thrive, their paradigm of parallel human/object detection and verb class prediction loses two-stage methods' merit: object-guided hierarchy. The object in one HOI triplet gives direct clues to the verb to be predicted. In this paper, we aim to boost end-to-end models with object-guided statistical priors. Specifically, We propose to utilize a Verb Semantic Model (VSM) and use semantic aggregation to profit from this object-guided hierarchy. Similarity KL (SKL) loss is proposed to optimize VSM to align with the HOI dataset's priors. To overcome the static semantic embedding problem, we propose to generate cross-modality-aware visual and semantic features by Cross-Modal Calibration (CMC). The above modules combined composes Object-guided Cross-modal Calibration Network (OCN). Experiments conducted on two popular HOI detection benchmarks demonstrate the significance of incorporating the statistical prior knowledge and produce state-of-the-art performances. More detailed analysis indicates proposed modules serve as a stronger verb predictor and a more superior method of utilizing prior knowledge. The codes are available at \url{https://github.com/JacobYuan7/OCN-HOI-Benchmark}.

Hangjie Yuan, Mang Wang, Dong Ni, Liangpeng Xu• 2022

Related benchmarks

TaskDatasetResultRank
Human-Object Interaction DetectionHICO-DET (test)
mAP (full)31.43
493
Human-Object Interaction DetectionV-COCO (test)
AP (Role, Scenario 1)65.3
270
Human-Object Interaction DetectionHICO-DET
mAP (Full)31.43
233
HOI DetectionV-COCO
AP Role 164.2
40
HOI DetectionHICO-DET
mAP (Rare)25.56
34
HOI DetectionV-COCO v1 (test)
AP Role (Scenario 1)64.2
25
HOI DetectionHICO-DET v1 (test)
mAP (Rare)25.56
24
Human-Object Interaction DetectionV-COCO standard (test)
AP (Role 1)64.2
18
Showing 8 of 8 rows

Other info

Code

Follow for update