Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Modeling Multi-Label Action Dependencies for Temporal Action Localization

About

Real-world videos contain many complex actions with inherent relationships between action classes. In this work, we propose an attention-based architecture that models these action relationships for the task of temporal action localization in untrimmed videos. As opposed to previous works that leverage video-level co-occurrence of actions, we distinguish the relationships between actions that occur at the same time-step and actions that occur at different time-steps (i.e. those which precede or follow each other). We define these distinct relationships as action dependencies. We propose to improve action localization performance by modeling these action dependencies in a novel attention-based Multi-Label Action Dependency (MLAD)layer. The MLAD layer consists of two branches: a Co-occurrence Dependency Branch and a Temporal Dependency Branch to model co-occurrence action dependencies and temporal action dependencies, respectively. We observe that existing metrics used for multi-label classification do not explicitly measure how well action dependencies are modeled, therefore, we propose novel metrics that consider both co-occurrence and temporal dependencies between action classes. Through empirical evaluation and extensive analysis, we show improved performance over state-of-the-art methods on multi-label action localization benchmarks(MultiTHUMOS and Charades) in terms of f-mAP and our proposed metric.

Praveen Tirupattur, Kevin Duarte, Yogesh Rawat, Mubarak Shah• 2021

Related benchmarks

TaskDatasetResultRank
Action DetectionCharades (test)
PAC19.6
27
Temporal Action LocalizationMultiTHUMOS
f-mAP51.5
20
Activity DetectionCharades (test)
mAP22.9
19
Action DetectionMultiTHUMOS
mAPAC44.43
16
Activity DetectionMultiTHUMOS
mAP42.2
16
Multi-label Temporal Action SegmentationCharades 1.0 (test)
Seg-mAP23.7
14
Temporal Action DetectionMultiTHUMOS
Detection mAP14.2
12
Multi-label Temporal Action SegmentationMultiTHUMOS 1.0 (test)
Seg-mAP51.5
11
Action DetectionCharades
mAP (per-frame)22.9
10
Action DetectionCharades RGB (test)
mAP0.184
10
Showing 10 of 16 rows

Other info

Follow for update