Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection

About

Human-Object Interaction (HOI) detection, which localizes and infers relationships between human and objects, plays an important role in scene understanding. Although two-stage HOI detectors have advantages of high efficiency in training and inference, they suffer from lower performance than one-stage methods due to the old backbone networks and the lack of considerations for the HOI perception process of humans in the interaction classifiers. In this paper, we propose Vision Transformer based Pose-Conditioned Self-Loop Graph (ViPLO) to resolve these problems. First, we propose a novel feature extraction method suitable for the Vision Transformer backbone, called masking with overlapped area (MOA) module. The MOA module utilizes the overlapped area between each patch and the given region in the attention function, which addresses the quantization problem when using the Vision Transformer backbone. In addition, we design a graph with a pose-conditioned self-loop structure, which updates the human node encoding with local features of human joints. This allows the classifier to focus on specific human joints to effectively identify the type of interaction, which is motivated by the human perception process for HOI. As a result, ViPLO achieves the state-of-the-art results on two public benchmarks, especially obtaining a +2.07 mAP performance gain on the HICO-DET dataset. The source codes are available at https://github.com/Jeeseung-Park/ViPLO.

Jeeseung Park, Jin-Woo Park, Jong-Seok Lee• 2023

Related benchmarks

TaskDatasetResultRank
Human-Object Interaction DetectionHICO-DET (test)
mAP (full)62.09
493
Human-Object Interaction DetectionV-COCO (test)
AP (Role, Scenario 1)62.2
270
Human-Object Interaction DetectionHICO-DET
mAP (Full)37.22
233
Human-Object Interaction DetectionHICO-DET Known Object (test)
mAP (Full)40.61
112
Human-Object Interaction DetectionV-COCO 1.0 (test)
AP_role (#1)62.2
76
Human-Object Interaction DetectionV-COCO
AP^1 Role62.2
65
HOI DetectionV-COCO
AP Role 162.2
40
HOI DetectionHICO-DET (test)
Box mAP (Full)37.2
32
HOI DetectionHICO-DET
mAP (Default Full)37.22
21
HOI SegmentationHICO-DET (test)
mask mAP (Full)39.1
12
Showing 10 of 11 rows

Other info

Code

Follow for update