Learning Human-Object Interactions by Graph Parsing Neural Networks

About

This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images and videos. We introduce the Graph Parsing Neural Network (GPNN), a framework that incorporates structural knowledge while being differentiable end-to-end. For a given scene, GPNN infers a parse graph that includes i) the HOI graph structure represented by an adjacency matrix, and ii) the node labels. Within a message passing inference framework, GPNN iteratively computes the adjacency matrices and node labels. We extensively evaluate our model on three HOI detection benchmarks on images and videos: HICO-DET, V-COCO, and CAD-120 datasets. Our approach significantly outperforms state-of-art methods, verifying that GPNN is scalable to large datasets and applies to spatial-temporal settings. The code is available at https://github.com/SiyuanQi/gpnn.

Siyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing Shen, Song-Chun Zhu• 2018

Related benchmarks

Task	Dataset	Result
Human-Object Interaction Detection	HICO-DET (test)	mAP (full)13.11	544
Human-Object Interaction Detection	V-COCO (test)	AP (Role, Scenario 1)44	270
Human-Object Interaction Detection	HICO-DET	mAP (Full)13.11	263
Human-Object Interaction Detection	V-COCO 1.0 (test)	AP_role (#1)44	76
HOI Detection	HICO-DET (test)	Box mAP (Full)13.11	32
Human-Object Interaction Detection	V-COCO Scenario 1 1.0	AP (Role)44	32
Human-Object Interaction Detection	V-COCO	Box mAP (Scenario 1)44	32
HOI Detection	VidHOI (val)	mAP Full18.47	23
Human-Object Interaction Detection	V-COCO	AP (Role)44	23
Human-Object Interaction Detection	HICO-DET 9 (test)	mAP (Full)13.11	21

Showing 10 of 18 rows

Other info

Code

Follow for update

@wizwand_team Discord