Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Action Sensitivity Learning for Temporal Action Localization

About

Temporal action localization (TAL), which involves recognizing and locating action instances, is a challenging task in video understanding. Most existing approaches directly predict action classes and regress offsets to boundaries, while overlooking the discrepant importance of each frame. In this paper, we propose an Action Sensitivity Learning framework (ASL) to tackle this task, which aims to assess the value of each frame and then leverage the generated action sensitivity to recalibrate the training procedure. We first introduce a lightweight Action Sensitivity Evaluator to learn the action sensitivity at the class level and instance level, respectively. The outputs of the two branches are combined to reweight the gradient of the two sub-tasks. Moreover, based on the action sensitivity of each frame, we design an Action Sensitive Contrastive Loss to enhance features, where the action-aware frames are sampled as positive pairs to push away the action-irrelevant frames. The extensive studies on various action localization benchmarks (i.e., MultiThumos, Charades, Ego4D-Moment Queries v1.0, Epic-Kitchens 100, Thumos14 and ActivityNet1.3) show that ASL surpasses the state-of-the-art in terms of average-mAP under multiple types of scenarios, e.g., single-labeled, densely-labeled and egocentric.

Jiayi Shao, Xiaohan Wang, Ruijie Quan, Junjun Zheng, Jiang Yang, Yi Yang• 2023

Related benchmarks

TaskDatasetResultRank
Temporal Action DetectionTHUMOS-14 (test)
mAP@tIoU=0.571.7
330
Temporal Action LocalizationTHUMOS14 (test)
AP @ IoU=0.573.4
319
Temporal Action LocalizationActivityNet 1.3 (val)
AP@0.554.1
257
Temporal Action DetectionActivityNet v1.3 (val)
mAP@0.554.1
185
Temporal Action DetectionActivityNet 1.3 (test)
Average mAP36.2
80
Temporal Action LocalizationTHUMOS-14 (test)
mAP@0.383.1
36
Temporal Action LocalizationMultiTHUMOS
f-mAP25.5
20
Temporal Action Localization (Verb)Epic-Kitchens-100 (val)
mAP@0.127.9
19
Temporal Action Localization (Noun)Epic-Kitchens-100 (val)
mAP@0.126
17
Temporal Action LocalizationCharades (test)
Average mAP15.4
9
Showing 10 of 16 rows

Other info

Follow for update