Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Rethinking Video Human-Object Interaction: Set Prediction over Time for Unified Detection and Anticipation

About

Video-based human-object interaction (HOI) understanding requires both detecting ongoing interactions and anticipating their future evolution. However, existing methods usually treat anticipation as a downstream forecasting task built on externally constructed human-object pairs, limiting joint reasoning between detection and prediction. In addition, sparse keyframe annotations in current benchmarks can temporally misalign nominal future labels from actual future dynamics, reducing the reliability of anticipation evaluation. To address these issues, we introduce DETAnt-HOI, a temporally corrected benchmark derived from VidHOI and Action Genome for more faithful multi-horizon evaluation, and HOI-DA, a pair-centric framework that jointly performs subject-object localization, present HOI detection, and future anticipation by modeling future interactions as residual transitions from current pair states. Experiments show consistent improvements in both detection and anticipation, with larger gains at longer horizons. Our results highlight that anticipation is most effective when learned jointly with detection as a structural constraint on pair-level video representation learning. Benchmark and code will be publicly available.

Yuanhao Luo, Di Wen, Kunyu Peng, Ruiping Liu, Junwei Zheng, Yufan Chen, Jiale Wei, Rainer Stiefelhage• 2026

Related benchmarks

TaskDatasetResultRank
Video Human-Object Interaction Detection and AnticipationVidHOI DETAnt-HOI component
Recall@1059.92
15
Video Human-Object Interaction DetectionVidHOI
mAP (Full)16.27
4
Video Human-Object Interaction AnticipationVidHOI
mAP (h=1)16.4
3
Video Human-Object Interaction AnticipationAction Genome
mAP (h=1)9.22
3
Video Human-Object Interaction AnticipationAction Genome
Recall@10 (h=1)29.06
3
Video Human-Object Interaction DetectionAction Genome
mAP (Full)9.7
3
Video Human-Object Interaction DetectionAction Genome DETAnt-HOI
Recall@10 (h=0)28.89
3
Showing 7 of 7 rows

Other info

Follow for update