Neural-Logic Human-Object Interaction Detection

About

The interaction decoder utilized in prevalent Transformer-based HOI detectors typically accepts pre-composed human-object pairs as inputs. Though achieving remarkable performance, such paradigm lacks feasibility and cannot explore novel combinations over entities during decoding. We present L OGIC HOI, a new HOI detector that leverages neural-logic reasoning and Transformer to infer feasible interactions between entities. Specifically, we modify the self-attention mechanism in vanilla Transformer, enabling it to reason over the <human, action, object> triplet and constitute novel interactions. Meanwhile, such reasoning process is guided by two crucial properties for understanding HOI: affordances (the potential actions an object can facilitate) and proxemics (the spatial relations between humans and objects). We formulate these two properties in first-order logic and ground them into continuous space to constrain the learning process of our approach, leading to improved performance and zero-shot generalization capabilities. We evaluate L OGIC HOI on V-COCO and HICO-DET under both normal and zero-shot setups, achieving significant improvements over existing methods.

Liulei Li, Jianan Wei, Wenguan Wang, Yi Yang• 2023

Related benchmarks

Task	Dataset	Result
Human-Object Interaction Detection	HICO-DET (test)	mAP (full)35.47	544
Human-Object Interaction Detection	V-COCO (test)	AP (Role, Scenario 1)64.4	270
Human-Object Interaction Detection	HICO-DET	mAP (Full)35.47	263
Human-Object Interaction Detection	HICO-DET (Rare First Unseen Combination (RF-UC))	mAP (Full)33.17	77
Human-Object Interaction Detection	HICO-DET (NF-UC)	mAP (Full)27.95	56
Human-Object Interaction Detection	HICO-DET Non-rare First Unseen Composition (NF-UC)	AP (Unseen)26.84	49
Human-Object Interaction Detection	HICO-DET (UO)	mAP (Full)33.17	47
Human-Object Interaction Detection	V-COCO	AP (Role)65.6	23
Human-Object Interaction Detection	HICO-DET computed-box setting	mAP (Full)2.16	23
Human-Object Interaction Detection	HICO-DET closed setting	Performance Score (Rare)32.03	18

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord