UnionDet: Union-Level Detector Towards Real-Time Human-Object Interaction Detection
About
Recent advances in deep neural networks have achieved significant progress in detecting individual objects from an image. However, object detection is not sufficient to fully understand a visual scene. Towards a deeper visual understanding, the interactions between objects, especially humans and objects are essential. Most prior works have obtained this information with a bottom-up approach, where the objects are first detected and the interactions are predicted sequentially by pairing the objects. This is a major bottleneck in HOI detection inference time. To tackle this problem, we propose UnionDet, a one-stage meta-architecture for HOI detection powered by a novel union-level detector that eliminates this additional inference stage by directly capturing the region of interaction. Our one-stage detector for human-object interaction shows a significant reduction in interaction prediction time 4x~14x while outperforming state-of-the-art methods on two public datasets: V-COCO and HICO-DET.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Human-Object Interaction Detection | HICO-DET (test) | mAP (full)19.76 | 493 | |
| Human-Object Interaction Detection | V-COCO (test) | AP (Role, Scenario 1)56.1 | 270 | |
| Human-Object Interaction Detection | HICO-DET | mAP (Full)19.76 | 233 | |
| Human-Object Interaction Detection | HICO-DET Known Object (test) | mAP (Full)19.76 | 112 | |
| Human-Object Interaction Detection | V-COCO 1.0 (test) | AP_role (#1)47.5 | 76 | |
| Human-Object Interaction Detection | V-COCO | AP^1 Role47.5 | 65 | |
| HOI Detection | V-COCO | AP Role 147.5 | 40 | |
| HOI Detection | HICO-DET | mAP (Rare)11.72 | 34 | |
| Human-Object Interaction Detection | V-COCO | Box mAP (Scenario 1)47.5 | 32 | |
| HOI Detection | V-COCO v1 (test) | AP Role (Scenario 1)47.5 | 25 |