Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Open-World Human-Object Interaction Detection via Multi-modal Prompts

About

In this paper, we develop \textbf{MP-HOI}, a powerful Multi-modal Prompt-based HOI detector designed to leverage both textual descriptions for open-set generalization and visual exemplars for handling high ambiguity in descriptions, realizing HOI detection in the open world. Specifically, it integrates visual prompts into existing language-guided-only HOI detectors to handle situations where textual descriptions face difficulties in generalization and to address complex scenarios with high interaction ambiguity. To facilitate MP-HOI training, we build a large-scale HOI dataset named Magic-HOI, which gathers six existing datasets into a unified label space, forming over 186K images with 2.4K objects, 1.2K actions, and 20K HOI interactions. Furthermore, to tackle the long-tail issue within the Magic-HOI dataset, we introduce an automated pipeline for generating realistically annotated HOI images and present SynHOI, a high-quality synthetic HOI dataset containing 100K images. Leveraging these two datasets, MP-HOI optimizes the HOI task as a similarity learning process between multi-modal prompts and objects/interactions via a unified contrastive loss, to learn generalizable and transferable objects/interactions representations from large-scale data. MP-HOI could serve as a generalist HOI detector, surpassing the HOI vocabulary of existing expert models by more than 30 times. Concurrently, our results demonstrate that MP-HOI exhibits remarkable zero-shot capability in real-world scenarios and consistently achieves a new state-of-the-art performance across various benchmarks.

Jie Yang, Bingliang Li, Ailing Zeng, Lei Zhang, Ruimao Zhang• 2024

Related benchmarks

TaskDatasetResultRank
Human-Object Interaction DetectionHICO-DET (test)
mAP (full)44.53
493
Human-Object Interaction DetectionHICO-DET
mAP (Full)44.53
233
Human-Object Interaction DetectionV-COCO
AP^1 Role66.2
65
HOI DetectionV-COCO v1 (test)
AP Role (Scenario 1)66.2
25
Human-Object Interaction DetectionSWIG-HOI Rare (test)
mAP14.78
11
Human-Object Interaction DetectionSWIG-HOI Non-rare (test)
mAP20.28
11
Human-Object Interaction DetectionSWIG-HOI (Full)
mAP12.61
8
Human-Object Interaction DetectionSWIG HOI (test)
mAP (Non-rare)20.28
7
HOI DetectionHCVRD (test)
mAP (Full)11.29
3
Human-Object Interaction DetectionSWiG
mAP (Full)16.21
3
Showing 10 of 10 rows

Other info

Code

Follow for update