Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning Human-Object Interactions by Graph Parsing Neural Networks

About

This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images and videos. We introduce the Graph Parsing Neural Network (GPNN), a framework that incorporates structural knowledge while being differentiable end-to-end. For a given scene, GPNN infers a parse graph that includes i) the HOI graph structure represented by an adjacency matrix, and ii) the node labels. Within a message passing inference framework, GPNN iteratively computes the adjacency matrices and node labels. We extensively evaluate our model on three HOI detection benchmarks on images and videos: HICO-DET, V-COCO, and CAD-120 datasets. Our approach significantly outperforms state-of-art methods, verifying that GPNN is scalable to large datasets and applies to spatial-temporal settings. The code is available at https://github.com/SiyuanQi/gpnn.

Siyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing Shen, Song-Chun Zhu• 2018

Related benchmarks

TaskDatasetResultRank
Human-Object Interaction DetectionHICO-DET (test)
mAP (full)13.11
493
Human-Object Interaction DetectionV-COCO (test)
AP (Role, Scenario 1)44
270
Human-Object Interaction DetectionHICO-DET
mAP (Full)13.11
233
Human-Object Interaction DetectionV-COCO 1.0 (test)
AP_role (#1)44
76
HOI DetectionHICO-DET (test)
Box mAP (Full)13.11
32
Human-Object Interaction DetectionV-COCO
Box mAP (Scenario 1)44
32
HOI DetectionVidHOI (val)
mAP Full18.47
23
Human-Object Interaction DetectionV-COCO
AP (Role)44
23
Human-Object Interaction DetectionHICO-DET 9 (test)
mAP (Full)13.11
21
Human-Object Interaction DetectionV-COCO standard (test)
AP (Role 1)44
18
Showing 10 of 18 rows

Other info

Code

Follow for update