Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

What to look at and where: Semantic and Spatial Refined Transformer for detecting human-object interactions

About

We propose a novel one-stage Transformer-based semantic and spatial refined transformer (SSRT) to solve the Human-Object Interaction detection task, which requires to localize humans and objects, and predicts their interactions. Differently from previous Transformer-based HOI approaches, which mostly focus at improving the design of the decoder outputs for the final detection, SSRT introduces two new modules to help select the most relevant object-action pairs within an image and refine the queries' representation using rich semantic and spatial features. These enhancements lead to state-of-the-art results on the two most popular HOI benchmarks: V-COCO and HICO-DET.

A S M Iftekhar, Hao Chen, Kaustav Kundu, Xinyu Li, Joseph Tighe, Davide Modolo• 2022

Related benchmarks

TaskDatasetResultRank
Human-Object Interaction DetectionHICO-DET (test)--
493
Human-Object Interaction DetectionV-COCO (test)
AP (Role, Scenario 1)65
270
Human-Object Interaction DetectionHICO-DET--
233
Human-Object Interaction DetectionV-COCO
AP^1 Role63.7
65
HOI DetectionHICO-DET v1.0 (test)
mAP (Default, Full)31.34
29
Showing 5 of 5 rows

Other info

Follow for update