Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Gaussian Temporal Awareness Networks for Action Localization

About

Temporally localizing actions in a video is a fundamental challenge in video understanding. Most existing approaches have often drawn inspiration from image object detection and extended the advances, e.g., SSD and Faster R-CNN, to produce temporal locations of an action in a 1D sequence. Nevertheless, the results can suffer from robustness problem due to the design of predetermined temporal scales, which overlooks the temporal structure of an action and limits the utility on detecting actions with complex variations. In this paper, we propose to address the problem by introducing Gaussian kernels to dynamically optimize temporal scale of each action proposal. Specifically, we present Gaussian Temporal Awareness Networks (GTAN) --- a new architecture that novelly integrates the exploitation of temporal structure into an one-stage action localization framework. Technically, GTAN models the temporal structure through learning a set of Gaussian kernels, each for a cell in the feature maps. Each Gaussian kernel corresponds to a particular interval of an action proposal and a mixture of Gaussian kernels could further characterize action proposals with various length. Moreover, the values in each Gaussian curve reflect the contextual contributions to the localization of an action proposal. Extensive experiments are conducted on both THUMOS14 and ActivityNet v1.3 datasets, and superior results are reported when comparing to state-of-the-art approaches. More remarkably, GTAN achieves 1.9% and 1.1% improvements in mAP on testing set of the two datasets.

Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei• 2019

Related benchmarks

TaskDatasetResultRank
Temporal Action DetectionTHUMOS-14 (test)
mAP@tIoU=0.538.8
330
Temporal Action LocalizationTHUMOS14 (test)
AP @ IoU=0.538.8
319
Temporal Action LocalizationTHUMOS-14 (test)
mAP@0.357.8
308
Temporal Action LocalizationActivityNet 1.3 (val)
AP@0.552.61
257
Temporal Action DetectionActivityNet v1.3 (val)
mAP@0.552.61
185
Temporal Action LocalizationTHUMOS 2014
mAP@0.3057.8
93
Temporal Action DetectionActivityNet 1.3 (test)
Average mAP34.3
80
Temporal Action LocalizationTHUMOS 14
mAP@0.357.8
44
Temporal Action LocalizationTHUMOS-14 (test)
mAP@0.357.8
36
Temporal Action DetectionActivityNet
Average mAP34.31
17
Showing 10 of 11 rows

Other info

Follow for update